Abstract

AbstractThis work presents an automatic tonal/nontonal preclassification‐based Indian language identification (LID) system. Languages are firstly classified into tonal and nontonal categories, and then, individual languages are identified from the languages of the respective categories. This work proposes the use of pitch Chroma and formant features for this task, and also investigates how Mel‐frequency Cepstral Coefficients (MFCCs) complement these features. It further explores block processing (BP), pitch synchronous analysis (PSA)‐ and glottal closure regions (GCRs)‐based approaches for feature extraction, using syllables as basic units. Cascade convolutional neural network (CNN)‐long short‐term memory (LSTM) model using syllable‐level features has been developed. National Institute of Technology Silchar language database (NITS‐LD) and OGI‐Multilingual Telephone Speech Corpus (OGI‐MLTS) have been used for experimental validation. The proposed system based on the score combination of Cascade CNN‐LSTM models of Chroma (extracted from BP method), first two formants and MFCCs (both extracted from GCR method) reports the highest accuracies. In the preclassification stage, the observed accuracies are 91%, 87.3%, and 85.1% for NITS‐LD, for 30 s, 10 s, and 3 s test data respectively. For OGI‐MLTS database, the respective accuracies are 86.7%, 83.1%, and 80.6%. That amounts to absolute improvements of 11.6%, 12.3%, and 13.9% for NITS‐LD, and 12.5%, 11.9%, and 12.6% for OGI‐MLTS database with respect to that of the baseline system. The proposed preclassification‐based LID system shows improvements of 7.3%, 6.4%, and 7.4% for NITS‐LD and 6.1%, 6.7%, and 7.2% for OGI‐MLTS database over the baseline system for the three respective test data conditions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.