Low-Resource Active Learning of Morphological Segmentation

Stig-Arne Grönroos,Kristiina Jokinen,Mikko Kurimo,Sami Virpioja,Ilona Rauhala,Katri Hiovain,Peter Smit

doi:10.3384/nejlt.2000-1533.1644

Abstract

Many Uralic languages have a rich morphological structure, but lack morphological analysis tools needed for efficient language processing. While creating a high-quality morphological analyzer requires a significant amount of expert labor, data-driven approaches may provide sufficient quality for many applications. We study how to create a statistical model for morphological segmentation with a large unannotated corpus and a small amount of annotated word forms selected using an active learning approach. We apply the procedure to two Finno-Ugric languages: Finnish and North Sámi. The semi-supervised Morfessor FlatCat method is used for statistical learning. For Finnish, we set up a simulated scenario to test various active learning query strategies. The best performance is provided by a coverage-based strategy on word initial and final substrings. For North Sámi we collect a set of humanannotated data. With 300 words annotated with our active learning setup, we see a relative improvement in morph boundary F1-score of 19% compared to unsupervised learning and 7.8% compared to random selection.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Low-Resource Active Learning of Morphological Segmentation

Abstract

Talk to us

Similar Papers

More From: Northern European Journal of Language Technology

Lead the way for us

Journal: Northern European Journal of Language Technology	Publication Date: Mar 13, 2016
Citations: 1

Similar Papers

Low-Resource Active Learning of North Sámi Morphological Segmentation
Stig-Arne Grönroos ... Kristiina Jokinen
Septentrio Conference Series | VOL. -
Stig-Arne Grönroos, et. al.Stig-Arne Grönroos ... Kristiina Jokinen
17 Jun 2015
Septentrio Conference Series | VOL. -

Toward Label-Efficient Neural Network Training: Diversity-Based Sampling in Semi-Supervised Active Learning
Felix Buchert ... Nassir Navab
IEEE Access | VOL. 11
Felix Buchert, et. al.Felix Buchert ... Nassir Navab
01 Jan 2023
IEEE Access | VOL. 11

Efficient User Localization in Wireless Networks Using Active Deep Learning
Chuan Sun ... Morteza Hashemi
-
Chuan Sun, et. al.Chuan Sun ... Morteza Hashemi
31 Oct 2021
31 Oct 2021

A framework to build accurate Convolutional Neural Network models for melanoma diagnosis
Eduardo Pérez ... Sebastián Ventura
Knowledge-Based Systems | VOL. 260
Eduardo Pérez, et. al.Eduardo Pérez ... Sebastián Ventura
26 Nov 2022
Knowledge-Based Systems | VOL. 260

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Low-Resource Active Learning of Morphological Segmentation

Abstract

Talk to us

Similar Papers

More From: Northern European Journal of Language Technology