Ensemble Method for Identification and Automatic Production of Related Words for Historical Linguistics

G Sajini,Jagadish S Kallimani

doi:10.1007/978-981-15-8677-4_40

G Sajini, Jagadish S Kallimani

https://doi.org/10.1007/978-981-15-8677-4_40

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Language change throughout time and space is one of the major issues in linguistic history. The paper deals with new methods for the study of language evolutions to help researchers and experts. Firstly, a method is used to determine, if the words are cognate or not. A linguistic information algorithm is proposed to derive cognates from online dictionaries. Then a dataset is created of similar terms and machine learning techniques are used to focus on spelling to classify the cognates. The aligned subsequences are used to identify standards and guidelines for language change in newly created languages mainly to distinguish between non-cognate and cognates which are used for classification algorithms. Secondly, for identifying the sort of association between those words that humans expand the method to a simpler level. Discriminating cognates and debts gives an insight into a language’s history and allows a clearer understanding of the linguistic relationship. The spelling characteristics have discriminative features and analyze the linguistic factors underlying this classification task. This is considered as the first such effort, to linguistic knowledge. Thirdly, a machine learning technique is developed for producing similar words automatically. One should concentrate on proto-word reconstruction to address issues related to it to generate the modern words which are not synonyms and another one is generating cognates. The task of reconstruction of proto words is to recreate words from its modern daughter languages in an ancient language. The method is based on the regularity of words and uses knowledge from many modern languages to build an ensemble method for proto-word reconstruction. This method is applied to multiple datasets to improve from the previous dataset accuracies.

Full Text