The rise of AI-generated content and the growing use of Large Language Models have sparked important questions about their impact on authors' rights. Since Generative AI is relatively new, many legal issues remain unresolved, with ongoing court cases shaping the landscape. This page provides information on these concerns and offers guidance to scholarly authors navigating the current legal and ethical challenges surrounding AI and copyright.
Some of this information is guidance from Authors Alliance and UC-Berkeley.
This question is currently being debated in numerous court cases and will likely change and evolve. In a recent court case, 'Thaler v. Perlmutter,' a U.S. District Court found that work autonomously generated by an AI model is not copyrightable. In general, though, the short answer is no. U.S. Copyright law requires human authorship. If there is significant creative human involvement in the creation or editing of the AI-generated work, it may qualify for copyright protection.
Again, this is currently being litigated. Bartz v. Anthropic is one of the most important cases in this area. In this case, authors sued Anthropic (an AI startup) for copyright infringement, alleging that Anthropic used copyrighted materials in training their LLMs. The judge in this case found that training Claude (Anthropic's LLM) using the authors' books constituted fair use due to the transformative nature of the training. However, the judge ruled that Anthropic was not allowed to utilize pirated materials. This is considered a landmark case because it is the first case to decide that training AI with copyrighted materials constitutes fair use.
There isn't a definitive way to know if an AI model has been trained using your research. However, there is a Generative AI Licensing Agreement Tracker that shows which publishers have signed licensing agreements to allow LLMs to train on scholarly content. While you won't be able to see if your exact research was used, you can get a general idea if your publisher has signed an agreement.
It is impossible to fully prevent AI from training on your published work, but you can take steps to limit its use. Publishing in venues that explicitly prohibit AI data scraping and submitting requests to remove content from training datasets when companies offer opt-out mechanisms are potential strategies, but your work may still end up being used to train AI. If you wish to prevent the use of your work in AI training, then researching methods to opt out of AI training, such as utilizing robots.txt, will provide you with your best chances.
Currently, most academic writers are not compensated when their work is used to train AI models. Compensation is typically not offered unless a specific agreement or legal framework mandates it, and there are now some publishers who are paying authors a royalty for sharing their work for LLM training. You can use the Generative AI Licensing Agreement Tracker to see which publishers have these agreements.
Yes, you can use AI-generated content in your scholarly work, but it's essential to proceed with caution. Utah State University advises researchers to avoid inputting confidential, proprietary, or restricted data into AI tools due to concerns about data privacy and ownership. Additionally, publishing standards regarding AI-generated content vary; some journals prohibit it, while others permit it with proper disclosure. Therefore, always check your target publication's policies on AI use. Finally, ensure that any funding agencies or collaborators involved in your research have no restrictions on AI use and are informed about its incorporation into the project.