Can legitimate interest be an appropriate lawful basis for processing Artificial Intelligence training datasets?

Pablo Trigo Kramcsák

doi:10.1016/j.clsr.2022.105765

Abstract

Precision and effectiveness of Artificial Intelligence (AI) models are highly dependent on the availability of genuine, relevant, and representative training data. AI systems tested and validated on poor-quality datasets can produce inaccurate, erroneous, skewed, or harmful outcomes (actions, behaviors, or decisions), with far-reaching effects on individuals' rights and freedoms.Appropriate data governance for AI development poses manifold regulatory challenges, especially regarding personal data protection. An area of concern is compliance with rules for lawful collection and processing of personal data, which implies, inter alia, that using databases for AI design and development should be based on a clear and precise legal ground: the prior consent of the data subject or another specific valid legal basis.Faced with this challenge, the European Union's personal data protection legal framework does not provide a preferred, one-size-fits-all answer, and the best option will depend on the circumstances of each case. Although there is no hierarchy among the different legal bases for data processing, in doubtful cases, consent is generally understood by data controllers as a preferred or default choice for lawful data processing. Notwithstanding this perception, obtaining data subjects' consent is not without drawbacks for AI developers or AI-data controllers, as they must meet (and demonstrate) various requirements for the validity of consent. As a result, data subjects' consent could not be a suitable and realistic option to serve AI development purposes. In view of this, it is necessary to explore the possibility of basing this type of personal data processing on lawful grounds other than the data subject's consent, specifically, the legitimate interest of the data controller or third parties. Given its features, legitimate interests could help to meet the challenge of quality, quantity, and relevance of data curation for AI training.The aim of this article is to provide an initial conceptual approach to support the debate about data governance for AI development in the European Union (EU), as well as in non-EU jurisdictions with European-like data protection laws. Based on the rules set by the EU General Data Protection Regulation (GDPR), this paper starts by referring to the relevance of adequate data curation and processing for designing trustworthy AI systems, followed by a legal analysis and conceptualization of some difficulties data controllers face for lawful processing of personal data. After reflecting on the legal standards for obtaining data subject's valid consent, the paper argues that legitimate interests (if certain criteria are met) may better match the purpose of building AI training datasets.

Full Text