Abstract

This paper proposes an approach for automatic language identification (LID) for seven Indian languages. The proposed system uses language dependent phonotactic features and prosodic information. Phonetic Engine (PE)which serves as the front end of the phonotactic based LID system converts input speech utterance to a sequence of phonetic symbols. Syllable boundaries are detected and phones within a syllable boundary are grouped and phono-tactic rules are applied to get syllables. Two consecutive syllables are numerically represented to get phonotactic feature vectors. Prosodic feature vectors are obtained by concatenating features of three consecutive syllables. A multilayer feed forward neural network (NN) classifier is used at the back-end for language identification. The ANN classifier is trained with two hour duration data from each of the seven languages. Target languages include Bengali, Hindi, Telugu, Urdu, Assamese, Punjabi and Manipuri.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.