Abstract

Within the EU Network of Excellence PASCAL, a challenge was organized to design a statistical machine learning algorithm that segments words into the smallest meaning-bearing units of language, morphemes. Ideally, these are basic vocabulary units suitable for different tasks, such as speech and text understanding, machine translation, information retrieval, and statistical language modeling. Twelve research groups participated in the challenge and had submitted segmentation results obtained by their algorithms. In this paper, we evaluate the application of these segmentation algorithms to large vocabulary speech recognition using statistical n-gram language models based on the proposed word segments instead of entire words. Experiments were done for two agglutinative and morphologically rich languages: Finnish and Turkish. We also investigate combining various segmentations to improve the performance of the recognizer. Index Terms: speech recognition, language modelling, morphemes, unsupervised learning.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.