Copyright Lawsuit Against OpenAI Highlights Potential Implications for ChatGPT

If you were to discover that part of your independent research on a rather obscure subject was found in a ChatGPT generated paper or article without attribution to you, would your first inclination be

flattery
anger
amazement that anyone would train AI to find something positive associated with your name
all of the above

As I wrote a few weeks ago, one legal sword that may keep ChatGPT in check is copyright law, specifically non attribution by the AI end product to an individual’s research, written or spoken works, and anything else that can be copyrighted.

Last week, two authors, Paul Tremblay and Mona Awad, filed suit in the United States District Court, Northern District of California against OpenAI, who makes the software ChatGPT uses to produce sometimes, stunning results in generating AI based documents in very little time.

Tremblay, a writer from Massachusetts, have written several books in which he owns copyrights for including The Cabin at the End of The World, one of the specific subjects of the lawsuit. Awad, who also is from Massachusetts, owns copyrights in several books, including 13 Ways of Looking at a Fat Girl and Bunny. All of these books have the usual copyright management information found at or near the start of the book.

When OpenAI trains a language model using their software, it creates a dataset of whatever the target is (in this case, incredibly large amount of books and text contained therein) and sorts through it, pulling out coherent and expressive data (which it seems copyright information was not a part of) The act of copying the book and putting it into their data set is a significant part of the allegations of this case. The language model of Chat GPT and their various versions, according to the plaintiffs, have copied their books into databases so when something like a summary of the book is asked for, and it is accurate, it can be inferred, the information is stored without copyright management protections the plaintiffs put in their books. In other words OpenAI cannot function without expressive information from these books and the act of putting them in their database and releasing them without the copyright information included amounts to, according to the plaintiffs, copyright infringement and a host of other violations of both statutory and common law principles.

One thing, amongst many, that is very interesting about this lawsuit, is that it was filed as a class action, even though it only has two plaintiffs at this time, open to:

All persons or entities domiciled in the United States that own a United States copyright in any work that was used as training data for the OpenAI Language Models during the Class Period.

This means essentially the class could grow to extremely high numbers of plaintiffs, trade secrets and dataset acquisition methods may have to be revealed, and most importantly, the effect on the progress and further use of ChatGPT and other similar products, could be monumental in nature. This case will be one to watch closely as it progresses.

Related News