Abstract
Cuneiform tablets appertain to the oldest textual artifacts and are in extent comparable to texts written in Latin or ancient Greek. The Cuneiform Commentaries Project (CPP) from Yale University provides tracings of cuneiform tablets with annotated transliterations and translations. As a part of our work analyzing cuneiform script computationally with 3D-acquisition and word-spotting, we present a first approach for automatized learning of transliterations of cuneiform tablets based on a corpus of parallel lines. These consist of manually drawn cuneiform characters and their transliteration into an alphanumeric code. Since the Cuneiform script is only available as raster-data, we segment lines with a projection profile, extract Histogram of oriented Gradients (HoG) features, detect outliers caused by tablet damage, and align those features with the transliteration. We apply methods from part-of-speech tagging to learn a correspondence between features and transliteration tokens. We evaluate point-wise classification with K-Nearest Neighbors (KNN) and a Support Vector Machine (SVM); sequence classification with a Hidden Markov Model (HMM) and a Structured Support Vector Machine (SVM-HMM). Analyzing our findings, we reach the conclusion that the sparsity of data, inconsistent labeling and the variety of tracing styles do currently not allow for fully automatized transliterations with the presented approach. However, the pursuit of automated learning of transliterations is of great relevance as manual annotation in larger quantities is not viable, given the few experts capable of transcribing cuneiform tablets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.