Abstract

This paper proposes a method for hypothesizing word boundaries in Hindi speech. The method is based on the observation that function words such as case markers, pronouns and conjunctions occur frequently in Hindi text and spotting of these frequently occurring patterns is proposed as a means for hypothesizing word boundaries in a speech-to-text conversion system for Hindi. Initially, the idea was tested on a correct text with all word boundaries (except sentence boundaries) removed; the results showed that nearly 67% of the word boundaries were correctly hypothesized. Later, experiments with input containing errors simulated to represent speech environment showed that the proposed method is effective even at error levels as high as 50%. The implications of these results in the development of a speech-to-text conversion system for Hindi are discussed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.