Abstract
Ugarit is a public web-based tool for manual annotation of parallel texts for generating word-level translation alignment. We aimed to develop a user-friendly interactive interface to visualize aligned texts and collect training data in the form of translation pairs to be used later, (i) for training an automatic translation alignment system for historical languages at the word/phrase level, (ii) as a gold standard to evaluate automatic alignment and machine translation systems. Ugarit is now widely used for learning new languages, especially historical languages, and as a reading environment for parallel texts. In the following sections, we present the related works and similar projects; then, we give an overview of the visualization techniques used to present the alignment results. Further, we explain how we could derive the translation graph from the aligned translation pairs. Finally, we discuss the usage limitations of Ugarit, possible improvements, and future development plans.
Highlights
Translation alignment is a major task in Digital Humanities and Natural LanguageProcessing
The accuracy of the automatic alignment varies according to multiple factors, such as text type and length, size of the corpus, and translation quality and consistency
We describe the development process and show how manual alignment can be performed in U GARIT
Summary
Translation alignment is a major task in Digital Humanities and Natural Language. Processing. It is the process of comparing two texts in different languages to find translation correspondences among the textual units in the source and translation texts [1] It can be performed at various granularity levels according to the project’s context or the research purpose. We can mention the Blinker Project [20], which developed the first annotation tool for manual text alignment to align different versions of the Bible in French and English at the word level. U GARIT was initially designed to visualize the automatically aligned texts available at Perseus Digital Library [32] and collect training data in the form of translation pairs to implement a statistical translation alignment system for historical languages, mainly Ancient. We discuss the limitations, possible improvements, and new features we intend to integrate into the release of U GARIT
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.