Crosslinguistic Semantic Textual Similarity of Buddhist Chinese and Classical Tibetan

Rafal Felbur,Marieke Meelen,Paul Vierthaler

doi:10.5334/johd.86

Crosslinguistic Semantic Textual Similarity of Buddhist Chinese and Classical Tibetan

Rafal Felbur, Marieke Meelen + Show 1 more

Open Access

https://doi.org/10.5334/johd.86

Copy DOI

Journal: Journal of Open Humanities Data	Publication Date: Oct 4, 2022
License type: cc-by

Affiliation: Leiden University, University of Cambridge, William & Mary

#Classical Tibetan #Vectors In Order + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

In this paper we present the first-ever procedure for identifying highly similar sequences of text in Chinese and Tibetan translations of Buddhist <em>sūtra</em> literature. We initially propose this procedure as an aid to scholars engaged in the philological study of Buddhist documents. We create a cross-lingual embedding space by taking the cosine similarity of average sequence vectors in order to produce unsupervised similar cross-linguistic parallel alignments at word, sentence, and even paragraph level. Initial results show that our method lays a solid foundation for the future development of a fully-fledged Information Retrieval tool for these (and potentially other) low-resource historical languages.

Full Text