Licensing and Usage Rights of Language Data in Machine Translation

Mikel L Forcada

doi:10.1007/978-3-031-14689-3_4

Abstract

AbstractMachine translation (MT) is special in that it heavily relies on data. In rule-based MT, an engine performs the translation task by using language resources such as dictionaries and grammar rules, usually written by experts, but sometimes learned from monolingual or bilingual text. Corpus-based (statistical and, more recently, neural) MT leverages large amounts of monolingual and sentence-aligned bilingual text. Clearly, MT programs using these data are works of creation that may be copyright-protected, but this chapter focuses on data. Human labour, and therefore, creative authorship of works, is present in all forms of MT data: monolingual text has been authored, parallel text has been translated and aligned, and rules and dictionaries have been written by experts. Since its conception centuries ago, copyright protects the livelihoods of authors by regulating how copies of these data can be used and how works derived from them are used and published, using instruments such as licences. While the case of dictionaries and grammars as used in rule-based MT is reasonably clear, as they are purposely written for one or another language-processing application, monolingual and parallel text, as used in MT, were not created with MT in mind, and this has led some authors to ask whether authors and translators should get additional compensation for this unintended use of their work to generate new value downstream. This chapter gives an overview of the different sources of data used in MT, discussing authorship along the steps of creating, curating and transforming those data for use with MT, determining the kinds of implicit and explicit licensing schemes that apply to them and how they work. It also describes the controversy surrounding the use of published works to generate new, initially unintended, value through translation technologies and the various ways in which copyright issues are addressed.KeywordsMachine translationCorporaUsage rightsLicensingCopyrightProfessional translationsRepurposing

Full Text