Lawsuit Takes Aim at Scrapping Methods Underpinning Modern AI


“A pirate running away with a computer, digital art” DALL-E
Image: OpenAI

Anyone following the tech industry knows lawsuits at this point are a dime a dozen, however, a new entry filed this month against Microsoft owned Github challenges the fundamental foundational principles underpinning some of the most important artificial intelligence advancements in the past three decades.

The lawsuit, led by programmer and lawyer Matthew Butterick, specifically takes issues with Github’s Copilot, an AI assistant tool that offers programmers suggested snippets of code while they’re coding, sort of like the autocomplete function in Google Docs or Gmail. Copilot learned which types of lines to code after scraping huge swatches of publicly available lines of code on the open internet. During this process, the proposed class action lawsuit alleges Copilot blatantly ignores or removes licenses presented by software engineers and effectively relies on “software piracy on an unprecedented scale.”

“It is not fair, permitted, or justified,” the suit reads. “On the contrary, Copilot’s goal is to replace a huge swath of open source by taking it and keeping it inside a GitHub-controlled paywall. It violates the licenses that open-source programmers chose and monetizes their code despite GitHub’s pledge never to do so.”

In a separate blog post, Butterick argues Microsoft’s approach with Copilot creates a “walled garden” making it more difficult for programmers in traditional open source communities. If that continues, he argues, open source communities will starve and, over time, eventually kill them.

Rather than accuse Microsoft and Github of violating copyright laws, Butterick’s suit accuses Copilot of violating the companies’ own terms of service and privacy laws and of violating federal laws that require companies to display the copyright information of materials they use. And while this particular suit zeroes in on Copilot in particular, the principles of the argument potentially apply to many, many other tools in place that use similar scraping methods to develop their tools.

“If companies like Microsoft, GitHub, and OpenAI choose to disregard the law, they should not expect that we the public will sit still,” Butterick said in a recent blog post. “AI needs to be fair & ethical for everyone. If it’s not, then it can never achieve its vaunted aims of elevating humanity. It will just become another way for the privileged few to profit from the work of the many.”

“We’ve been committed to innovating responsibly with Copilot from the start and will continue to evolve the product to best serve developers across the globe,” a Github spokesperson said in an email to Gizmodo.

Microsoft did not respond to a request for comment.

‘A Brave New World of Software Piracy’

These concerns over AI copyright and compensation aren’t limited to programmers. Writers, musiciansand visual artists have all echoed these concerns in recent years, particularly in the wake of increasingly popular and effective generative AI image and video tools like Open AI’s DALL-E and Stable Diffusion. Unlike previous AI training which inelegantly stuffs billions of units of data into a learning set for an AI systems, newer generative approaches like DALL-E will take images from Pablo Picasso and then transform that into something new based on a users’ description. That act of repurposing the data complicates traditional copyright thinking even further. Like Butterick, a growing chorus of artists and creative writers have gone public recently expressing understandable fears the coming maturity of the AI ​​system threatens to put them out of job.

Some companies are exploring novel ways to credit people whose work ends up influencing the algorithm. Last month for instance, Shutterstock announced it would start selling DALL-E’s AI generated art (also trained on humans) directly on its website. As part of that initiative, Shuttersock said it would launch a first-of-its-kind “Contributor Fund” to compensate contributors whose Shutterstock images were used to help develop the tech. Shutterstock said it was also interested in compensating contributors with royalties when DALL-E uses their creations.

Whether or not that plan actually works in practice remains uncertain though and Shutterstock’s just one, relatively small company compared to Big Tech giants like Microsoft. Industry wide, proposed standards around compensating creators for inadvertently training AI systems remain nonexistent.

Butterick’s beef with Copilot in particular began almost as soon as the product was released. In a June, 2021 blog post titled, “This Copilot is Stupid and Wants to Kill Me” the lawyer said he agreed with others who described the tool as, “primarily an engine for violating open-source licenses.” The lawyers compared Copilot’s effectiveness at writing code to that of a 12-year-old who learned Javascript in a day. It’s also not always accurate.

“Copilot essentially tasks you with correcting a 12-year-old’s homework, over and over,” Butterick wrote.

Speaking of his recent suit, Butterick acknowledged the novelty of the complaint, and said it would likely be amended in the future. While likely the first legal effort of its kind to strike at the root of AI training, the programmer and lawyer said he believes it’s an important step to hold AI creators accountable in the future.

“This is the first step in what will be a long journey,” Butterick said. “As far as we know, this is the first class-action case in the US challenging the training and output of AI systems. It will not be the last. AI systems are not exempt from the law.”



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *