AbstractGenerative artificial intelligence (AI) systems, together with text and data mining (TDM), introduce complex challenges at the junction of data utilization and copyright laws. The inherent reliance of AI on large quantities of data, often encompassing copyrighted materials, results in multifaceted legal quandaries. Issues surface from the unfeasible task of securing permission from each copyright holder for AI training, further muddled by ambiguities in interpreting copyright laws and fair use provisions. Adding to the conundrum, the clandestine practices of data collection in proprietary AI systems obstruct copyright owners from detecting unauthorized use of their materials. The paper explores the exceptions to copyright laws for TDM in the European Union, the United Kingdom, and Japan, recognizing their crucial role in fostering AI development. The EU has a two‐pronged approach under the Directive on Copyright in the Digital Single Market, with one exception catering specifically to research organizations, and another, more generalized one, that can be restricted by rightsholders. The UK allows noncommercial TDM research without infringement but rejected a broader copyright exception due to concerns from the creative sector. Japan has the broadest TDM exception globally, permitting the nonenjoyment use of works without permission, though this can potentially overlook the rights of copyright owners. Notably, the applicability of TDM exceptions to AI‐produced copies remains unclear, creating potential legal challenges. Furthermore, an exploration of the fair use doctrine in the United States provides insight into its potential application in AI development. It focuses on the transformative aspect of usage and its impact on the original work's potential market. This exploration underscores the necessity for clear, practical guidelines. In response to these identified challenges, this paper proposes a hybrid model for TDM exceptions emerges, along with recommended specific mechanisms. The model divides exceptions into noncommercial and commercial uses, providing a nuanced solution to complex copyright issues in AI training. Recommendations incorporate mandatory exceptions for noncommercial uses, an opt‐out clause for commercial uses, enhanced transparency measures, and a searchable portal for copyright owners. In conclusion, striking a delicate equilibrium between technological progress and the incentive for creative expression is of paramount importance. These suggested solutions aim to establish a harmonious foundation that nurtures innovation and creativity while honoring creators' rights, facilitating AI development, promoting transparency, and ensuring fair compensation for creators.
Read full abstract