
This paper discusses phonemic syllabification using a pseudo nearest neighbour rule (PNNR) and phonotactic knowledge for Indonesian language. The proposed data-driven model uses a four-feature phoneme encoding and a phonotactic-based pre-syllabification. Evaluating on 50 k words dataset using 5-fold cross-validation shows that the proposed encoding significantly reduces the average syllable error rate (SER) by 13.90% relatively to the commonly used orthogonal binary encoding and the pre-syllabification also reduces the average SER up to 17.17% relatively to the PNNR without pre-syllabification. Five-fold cross-validating proves that the proposed PNNR-based syllabification is stable by producing an average SER of 0.64%. Most errors come from derivatives with the prefixes ‘ber’, ‘per’, and ‘ter’ as well as from compound words. This result is also significantly lower than a Look-Up-based syllabification that gives an average SER of 2.60%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call