Abstract

This paper proposes a method based on linguistic word-formation rules and dictionaries for determining reduplicative words in Vietnamese. The key idea for identifying whether adjacent syllables in a text can form a reduplicative word based on its formation rules. For 2-syllable reduplicative words, this paper uses rules that describe the repeating and the opposing between pairs of initial consonants, rhymes and tones. Then the method is expanded to identify reduplicative words that have 3 or 4 syllables from 2-syllable ones for the Vietnamese word segmentation task. Experimental results showed that the F1-score was improved to 98.61% and that word segmentation errors were reduced significantly, 1.26%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call