Automated Learning of Hungarian Morphology for Inflection Generation and Morphological Analysis

Gabor Szabo,Laszlo Kovacs

doi:10.11591/ijeei.v8i4.2545

Gabor Szabo, Laszlo Kovacs

Open Access

PDF Available

https://doi.org/10.11591/ijeei.v8i4.2545

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

The automated learning of morphological features of highly agglutinative languages is an important research area for both machine learning and computational linguistics. In this paper we present a novel morphology model that can solve the inflection generation and morphological analysis problems, managing all the affix types of the target language. The proposed model can be taught using (word, lemma, morphosyntactic tags) triples. From this training data, it can deduce word pairs for each affix type of the target language, and learn the transformation rules of these affix types using our previously published, lower-level morphology model called ASTRA. Since ASTRA can only handle a single affix type, a separate model instance is built for every affix type of the target language. Besides learning the transformation rules of all the necessary affix types, the proposed model also calculates the conditional probabilities of the affix type chains using relative frequencies, and stores the valid lemmas and their parts of speech. With these pieces of information, it can generate the inflected form of input lemmas based on a set of affix types, and analyze input inflected word forms. For evaluation, we use Hungarian data sets and compare the accuracy of the proposed model with that of state of the art morphology models published by SIGMORPHON, including the Helsinki (2016), UF and UTNII (2017), Hamburg, IITBHU and MSU (2018) models. The test results show that using a training data set consisting of up to 100 thousand random training items, our proposed model outperforms all the other examined models, reaching an accuracy of 98% in case of random input words that were not part of the training data. Using the high-resource data sets for the Hungarian language published by SIGMORPHON, the proposed model achieves an accuracy of about 95-98%.

Highlights

According to the theory of morphology and computational linguistics, words are built up from morphemes, that are the smallest morphological units with associated meaning [1]
In this paper we presented a novel multi-affix morphology model that can learn the morphology of highly agglutinative languages like Hungarian, and solve the inflection generation and morphological analysis problems, managing all the affix types of the target language
The proposed model calculates the conditional probability of all the possible affix type chains, stores the valid lemmas and their parts of speech, and trains a separate ASTRA model instance for each affix type, using a deduced set of word pairs demonstrating the transformation rules of the target affix type

Summary

Introduction

According to the theory of morphology and computational linguistics, words are built up from morphemes, that are the smallest morphological units with associated meaning [1]. The grammatically correct root form of a word is called the lemma, while the added morphemes that modify its base meaning are called affixes. Affixes may change some of the characters in the root form as well, resulting in for example vowel or consonant gradation. The process of adding affixes to a word is called inflection, while the inverse operation when we determine the lemma and the affixes of a word is called morphological analysis. In natural languages there are a finite number of affix types that determine the semantic meaning of the affixes, i.e. how the meaning of the base form is altered by them. Examples of affix types include accusative case, plural form, past tense, etc. The concrete appearance of affix types are affixes in the words

Objectives

Methods

Results

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Automated Learning of Hungarian Morphology for Inflection Generation and Morphological Analysis

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Indonesian Journal of Electrical Engineering and Informatics (IJEEI)

Lead the way for us

Journal: Indonesian Journal of Electrical Engineering and Informatics (IJEEI)	Publication Date: Dec 10, 2020
License type: cc-by

Similar Papers

Pushing the limits of solubility prediction via quality-oriented data selection.
Murat Cihan Sorkun ... Süleyman Er
iScience | VOL. 24
Murat Cihan Sorkun, et. al.Murat Cihan Sorkun ... Süleyman Er
17 Dec 2020
iScience | VOL. 24

Artificial intelligence in interdisciplinary life science and drug discovery research.
Jürgen Bajorath
Future science OA | VOL. 8
Jürgen BajorathJürgen Bajorath
08 Mar 2022
Future science OA | VOL. 8

Signal quality in cardiorespiratory monitoring
Gari D Clifford ... George B Moody
Physiological Measurement | VOL. 33
Gari D Clifford, et. al.Gari D Clifford ... George B Moody
17 Aug 2012
Physiological Measurement | VOL. 33

A Primer on Machine Learning.
Audrene S Edwards ... Bruce Kaplan
Transplantation | VOL. 105
Audrene S Edwards, et. al.Audrene S Edwards ... Bruce Kaplan
18 Aug 2020
Transplantation | VOL. 105

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Automated Learning of Hungarian Morphology for Inflection Generation and Morphological Analysis

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: Indonesian Journal of Electrical Engineering and Informatics (IJEEI)