Abstract

Swapping one or more consonant-graphemes in a word into other phonologically similar ones, which based on both place and manner of articulation, interestingly produces some other words without shifting the syllable boundary (or point). For examples, in the Indonesian language, swapping consonant-graphemes in a word “ba.ra” (embers) creates three new words: “ba.la” (disaster), “pa.ra” (reference to a group), and “pa.la” (nutmeg) without changing the syllabification points since both graphemes $$\langle \hbox {b}\rangle$$ and $$\langle \hbox {p}\rangle$$ are in the same category of plosive-bilabial while both $$\langle \hbox {r}\rangle$$ and $$\langle \hbox {l}\rangle$$ are trill/lateral-dental. An observation on 50k Indonesian words shows that replacing consonant-graphemes in those words impressively increases the number of unigrams by 16.52 times and significantly increases the number of bigrams by 14.12 times. Therefore, in this paper, a procedure of swapping consonant-graphemes based on phonological similarity is proposed to boost the standard bigram-based orthographic syllabification, which commonly has a low performance for a dataset with many out-of-vocabulary (OOV) bigrams. Some examinations on the 50k words using the k-fold cross-validation scheme, with $$k=5$$, prove that the proposed procedure significantly boosts the standard bigram-syllabification, where it gives a relative reduction of mean syllable error rate (SER) up to 31.39%. It also shows an improvement for the dataset of 15k named-entities by relatively decreasing the average SER by 9.53%. It is better than a flipping onsets-based model for both datasets. Compared to a nearest neighbor-based model, its performance is a little worse, but it provides much lower complexity. Another important finding is that the proposed model can produce a relatively small SER, even for a tiny training-set.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call