Abstract

Recently, an increasing number of ancient documents are being digitized in text form, but it is difficult to apply natural language processing techniques to these documents because the language resources for ancient languages, such as archaic dictionaries that have sufficient vocabularies, are scarce. In this paper, we propose a method for constructing an ancient modern Japanese dictionary using parallel corpus of ancient writings and their translations in modern language. The parallel corpus consists of document pairs in the same language but in ancient and modern versions. From this corpus, we try to acquire equivalent pairs of archaic and modern word by analyzing the frequencies of word occurrences in a sentence in ancient language and its corresponding modern language translation. We conducted an experiment of calculating similarities of occurrence frequencies of archaic and modern words.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call