Abstract

This paper focuses on the task of identifying a language from speech signal. In this paper, we have use Mel-frequency cepstral coefficient as features. Language identification models are developed for fifteen Indian languages namely Assamese, Bangla, Guajarati, Hindi, Kannada, Kashmiri, Malayalam, Marathi, Nepali, Oriya, Punjabi, Rajasthani, Tamil, Telugu and Urdu using these spectral features. The identification of above mentioned languages is carried out using Gaussian mixture model. A Semi natural read database is used for obtaining the language specific information. MFCC is obtained by using linear cosine transform of log power spectrum on a nonlinear mel-frequency scale. This paper shows that the performance of Language identification system is better when trained and tested with twenty nine features as compared to six, eight, thirteen, nineteen and twenty one MFCC features. It means more the number of features we use better the result we get. The average language recognition rate over fifteen Indian languages is around 88\\%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.