Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Tanmai Khanna,Irene Tang,Daniel G Swanson,Jonathan N Washington,Sevilay Bayatlı,Hèctor Alòs I Font,Tommi A Pirinen,Francis M Tyers

doi:10.1007/s10590-021-09260-6

Abstract

This paper presents an overview of Apertium, a free and open-source rule-based machine translation platform. Translation in Apertium happens through a pipeline of modular tools, and the platform continues to be improved as more language pairs are added. Several advances have been implemented since the last publication, including some new optional modules: a module that allows rules to process recursive structures at the structural transfer stage, a module that deals with contiguous and discontiguous multi-word expressions, and a module that resolves anaphora to aid translation. Also highlighted is the hybridisation of Apertium through statistical modules that augment the pipeline, and statistical methods that augment existing modules. This includes morphological disambiguation, weighted structural transfer, and lexical selection modules that learn from limited data. The paper also discusses how a platform like Apertium can be a critical part of access to language technology for so-called low-resource languages, which might be ignored or deemed unapproachable by popular corpus-based translation technologies. Finally, the paper presents some of the released and unreleased language pairs, concluding with a brief look at some supplementary Apertium tools that prove valuable to users as well as language developers. All Apertium-related code, including language data, is free/open-source and available at https://github.com/apertium.

Highlights

Apertium (Forcada et al 2011) is a free/open-source platform for rule-based machine translation (RBMT)
Its linguistic data is specified as context-free grammars (CFGs) and it uses a Generalized Left-right Right-reduce (GLR) parser rather than finite-state chunking to more effectively implement long-distance reordering
Discussed are apertium-recursive, which provides for true recursive transfer (Sect. 4.1); apertium-separable, which enables the processing of multi-word expressions (Sect. 4.2); and apertium-anaphora, which allows the resolution of anaphors in the source text (Sect. 4.3)

Summary

Introduction

Apertium (Forcada et al 2011) is a free/open-source platform for rule-based machine translation (RBMT). The platform provides an accessible way to create language data and rules, such that apart from experienced language developers, speakers of a language with a limited understanding of programming and/or linguistics can create decent translation systems for their languages as well This is a superior model for creating translation systems for low-resource languages both because it involves stakeholders from the language communities, and because most languages lack widely available corpora that would be needed for fully data-driven approaches. Several advances to the Apertium platform (Release version 3.6) have been implemented since the previous publication (Forcada et al 2011) These include organisational improvements, additional tools, additional methods to augment RBMT with corpus-based methods, new modules for more precise translation, a few additional tools not directly involved in the RBMT pipeline, and resources for many more languages and translation pairs. Recent advances in Apertium, a free/open‐source rule‐based

Overview of the Apertium platform

Use of corpus‐based approaches in Apertium modules

Morphological disambiguation

Lexical selection

Structural transfer module

New modules

Recursive structural transfer

Processing multi‐word expressions

Anaphora resolution

Some unique features

Example usage

Preliminary evaluation

Future work

Supporting minoritised languages

Released translation pairs

Other languages and work ahead

Apertium‐viewer

Website software

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Machine Translation	Publication Date: Oct 18, 2021
Citations: 10	License type: open-access

R Discovery Prime

R Discovery Prime

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Translation

Lead the way for us

Similar Papers

Analytical Review of Methods for Solving Data Scarcity Issues Regarding Elaboration of Automatic Speech Recognition Systems for Low-Resource Languages
Ildar Kagirov ... Irina Kipyatkova
Информатика и автоматизация | VOL. 21
Ildar Kagirov, et. al.Ildar Kagirov ... Irina Kipyatkova
08 Jul 2022
Информатика и автоматизация | VOL. 21

Unsupervised SMT: an analysis of Indic languages and a low resource language
Shefali Saxena ... Philemon Daniel
Journal of Experimental & Theoretical Artificial Intelligence | VOL. 36
Shefali Saxena, et. al.Shefali Saxena ... Philemon Daniel
29 Aug 2022
Journal of Experimental & Theoretical Artificial Intelligence | VOL. 36

Leveraging Additional Resources for Improving Statistical Machine Translation on Asian Low-Resource Languages
Hai-Long Trieu ... Le-Minh Nguyen
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 18
Hai-Long Trieu, et. al.Hai-Long Trieu ... Le-Minh Nguyen
17 Jun 2019
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 18

One True Pairing: Evaluating Effective Language Pairings for Fake News Detection Employing Zero-Shot Cross-Lingual Transfer
Samra Kasim
-
Samra KasimSamra Kasim
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Translation