Abstract
Swapping one or more consonant-graphemes in a word into other phonologically similar ones, which based on both place and manner of articulation, interestingly produces some other words without shifting the syllable boundary (or point). For examples, in the Indonesian language, swapping consonant-graphemes in a word “ba.ra” (embers) creates three new words: “ba.la” (disaster), “pa.ra” (reference to a group), and “pa.la” (nutmeg) without changing the syllabification points since both graphemes $$\langle \hbox {b}\rangle$$ and $$\langle \hbox {p}\rangle$$ are in the same category of plosive-bilabial while both $$\langle \hbox {r}\rangle$$ and $$\langle \hbox {l}\rangle$$ are trill/lateral-dental. An observation on 50k Indonesian words shows that replacing consonant-graphemes in those words impressively increases the number of unigrams by 16.52 times and significantly increases the number of bigrams by 14.12 times. Therefore, in this paper, a procedure of swapping consonant-graphemes based on phonological similarity is proposed to boost the standard bigram-based orthographic syllabification, which commonly has a low performance for a dataset with many out-of-vocabulary (OOV) bigrams. Some examinations on the 50k words using the k-fold cross-validation scheme, with $$k=5$$, prove that the proposed procedure significantly boosts the standard bigram-syllabification, where it gives a relative reduction of mean syllable error rate (SER) up to 31.39%. It also shows an improvement for the dataset of 15k named-entities by relatively decreasing the average SER by 9.53%. It is better than a flipping onsets-based model for both datasets. Compared to a nearest neighbor-based model, its performance is a little worse, but it provides much lower complexity. Another important finding is that the proposed model can produce a relatively small SER, even for a tiny training-set.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.