Morphological Analysis Using a Sequence Decoder

Ekin Akyürek,Erenay Dayanık,Deniz Yuret

doi:10.1162/tacl_a_00286

Ekin Akyürek, Erenay Dayanık + Show 1 more

Open Access

https://doi.org/10.1162/tacl_a_00286

Copy DOI

Abstract

We introduce Morse, a recurrent encoder-decoder model that produces morphological analyses of each word in a sentence. The encoder turns the relevant information about the word and its context into a fixed size vector representation and the decoder generates the sequence of characters for the lemma followed by a sequence of individual morphological features. We show that generating morphological features individually rather than as a combined tag allows the model to handle rare or unseen tags and to outperform whole-tag models. In addition, generating morphological features as a sequence rather than, for example, an unordered set allows our model to produce an arbitrary number of features that represent multiple inflectional groups in morphologically complex languages. We obtain state-of-the-art results in nine languages of different morphological complexity under low-resource, high-resource, and transfer learning settings. We also introduce TrMor2018, a new high-accuracy Turkish morphology data set. Our Morse implementation and the TrMor2018 data set are available online to support future research. 1 See https://github.com/ai-ku/Morse.jl for a Morse implementation in Julia/Knet (Yuret, 2016 ) and https://github.com/ai-ku/TrMor2018 for the new Turkish data set.

Highlights

1 Introduction possible morphological analyses: the accusative and possessive forms of the stem ‘‘masal’’ and the +With form of the stem ‘‘masa’’, all expressed with the same surface form (Oflazer, 1994)
We have experimented with other inputoutput formats, as described in Section 5: We found that jointly producing the lemma and the morphological features is more difficult than producing only morphological features in lowresource settings but gives similar performance in high-resource settings
The results demonstrate that Morse, generating analyses with its sequence decoder, significantly outperforms the state of the art in low-resource, high-resource, and transfer-learning experiments

Summary

Introduction

1 Introduction possible morphological analyses: the accusative and possessive forms of the stem ‘‘masal’’ (tale) and the +With form of the stem ‘‘masa’’ (table), all expressed with the same surface form (Oflazer, 1994). Oflazer et al (1999) observes that words in Turkish can have dependencies to any one of the inflectional groups of a derived word: in ‘‘mavi masalı oda’’ (room with a blue table) the adjective ‘‘mavi’’ (blue) modifies the noun root ‘‘masa’’ (table) even though the final part of speech of ‘‘masalı’’ is an adjective. This dependency would be difficult to represent without a detailed analysis of morphology. Morse performs lemmatization and tagging jointly by default; we report on separating the two tasks

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Nov 1, 2019
Citations: 24	License type: cc-by

R Discovery Prime

R Discovery Prime

Morphological Analysis Using a Sequence Decoder

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

ES13.05 The Ethics of International Research
R Pentz
Journal of Thoracic Oncology | VOL. 16
R PentzR Pentz
01 Oct 2021
ES13.05 The Ethics of International Research
R Pentz

Transfer learning by centroid pivoted mapping in noisy environment
Thach Nguyen Huy ... Einoshin Suzuki
Journal of Intelligent Information Systems | VOL. 41
Thach Nguyen Huy, et. al.Thach Nguyen Huy ... Einoshin Suzuki
09 Nov 2012
Journal of Intelligent Information Systems | VOL. 41

Developing Pseudo Continuous Pedotransfer Functions for International Soils Measured with the Evaporation Method and the HYPROP System: II. The Soil Hydraulic Conductivity Curve
Amninder Singh ... Wolfgang Durner
Water | VOL. 13
Amninder Singh, et. al.Amninder Singh ... Wolfgang Durner
23 Mar 2021
Water | VOL. 13

Why are women still dying from obstetric hemorrhage? A narrative review of perspectives from high and low resource settings
M.D Owen ... A.D Weeks
International Journal of Obstetric Anesthesia | VOL. 46
M.D Owen, et. al.M.D Owen ... A.D Weeks
25 Mar 2021
International Journal of Obstetric Anesthesia | VOL. 46

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Morphological Analysis Using a Sequence Decoder

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics