Abstract

The paper presents an original method for processing historical texts. A historical text is converted into its modernized equivalent by a tool called diachronic normalizer, embedded into a linguistic toolkit. The solution has a few merits. Firstly, the toolkit architecture allows for imposing the morphological constraints on diachronization rules. Secondly, the diachronic normalizer may be launched in the pipeline together with other NLP tools, such as parsers or translators. Lastly, the toolkit makes it possible to efficiently apply, in the diachronic normalization, a long list of diachronic pairs, found out with the aid of word distribution vectors in historical corpora.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call