Abstract The proliferation of generative AI has steered the discussion about text and data mining (TDM) in copyright law towards the problem of generative models. However, generative AI is not the only case for TDM. Mining data from computer programs can improve defect detection, discover design patterns, facilitate maintenance, summarize code in natural language or identify security vulnerabilities. The potential of such research, and the risk of skewing its results due to inadequate data justifies a venture into the scope of permitted TDM activities in relation to software. This article explores the intersection of TDM and the protection of computer programs in EU copyright law, focusing on the reproduction and alteration rights under the Software Directive. It argues that although among the new TDM exceptions and limitations in the Directive on Copyright in the Digital Single Market only the general one explicitly mentions computer programs, the research exception can also be applied if implemented carefully. This article also shows that computer programs can be reproduced in a way relevant to copyright law during TDM activities aimed at extracting data from traditional literary or artistic works. It offers an interpretation of Art. 5(1) of the Software Directive that prevents the misusing of copyright to thwart TDM in such scenarios. Finally, it analyses the diverging implementation strategies of TDM provisions adopted by EU Member States in the context of computer programs. By examining these issues, this article aims to clarify the scope of permissible TDM activities and advocate for policies that support research.
Read full abstract