Abstract

This paper presents an overview of Apertium, a free and open-source rule-based machine translation platform. Translation in Apertium happens through a pipeline of modular tools, and the platform continues to be improved as more language pairs are added. Several advances have been implemented since the last publication, including some new optional modules: a module that allows rules to process recursive structures at the structural transfer stage, a module that deals with contiguous and discontiguous multi-word expressions, and a module that resolves anaphora to aid translation. Also highlighted is the hybridisation of Apertium through statistical modules that augment the pipeline, and statistical methods that augment existing modules. This includes morphological disambiguation, weighted structural transfer, and lexical selection modules that learn from limited data. The paper also discusses how a platform like Apertium can be a critical part of access to language technology for so-called low-resource languages, which might be ignored or deemed unapproachable by popular corpus-based translation technologies. Finally, the paper presents some of the released and unreleased language pairs, concluding with a brief look at some supplementary Apertium tools that prove valuable to users as well as language developers. All Apertium-related code, including language data, is free/open-source and available at https://github.com/apertium.

Highlights

  • Apertium (Forcada et al 2011) is a free/open-source platform for rule-based machine translation (RBMT)

  • Its linguistic data is specified as context-free grammars (CFGs) and it uses a Generalized Left-right Right-reduce (GLR) parser rather than finite-state chunking to more effectively implement long-distance reordering

  • Discussed are apertium-recursive, which provides for true recursive transfer (Sect. 4.1); apertium-separable, which enables the processing of multi-word expressions (Sect. 4.2); and apertium-anaphora, which allows the resolution of anaphors in the source text (Sect. 4.3)

Read more

Summary

Introduction

Apertium (Forcada et al 2011) is a free/open-source platform for rule-based machine translation (RBMT). The platform provides an accessible way to create language data and rules, such that apart from experienced language developers, speakers of a language with a limited understanding of programming and/or linguistics can create decent translation systems for their languages as well This is a superior model for creating translation systems for low-resource languages both because it involves stakeholders from the language communities, and because most languages lack widely available corpora that would be needed for fully data-driven approaches. Several advances to the Apertium platform (Release version 3.6) have been implemented since the previous publication (Forcada et al 2011) These include organisational improvements, additional tools, additional methods to augment RBMT with corpus-based methods, new modules for more precise translation, a few additional tools not directly involved in the RBMT pipeline, and resources for many more languages and translation pairs. Recent advances in Apertium, a free/open‐source rule‐based

Overview of the Apertium platform
Use of corpus‐based approaches in Apertium modules
Morphological disambiguation
Lexical selection
Structural transfer module
New modules
Recursive structural transfer
Processing multi‐word expressions
Anaphora resolution
Some unique features
Example usage
Preliminary evaluation
Future work
Supporting minoritised languages
Released translation pairs
Other languages and work ahead
Apertium‐viewer
Website software
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call