Abstract

Dialects of languages demonstrate dependency on both speaker and sound-unit (phone)-related information, which encompasses the problem of dialect identification (DID) under the domain of language identification (LID). The DID task is more complicated than conventional LID, and it has been established that conventional acoustic features like perceptual linear prediction (PLP) and mel frequency cepstral coefficient (MFCC) features-which carry only phone-unit-related information-are not sufficient to address the problem of DID. The authors explore raw log critical band energy (LCBE) information obtained from critical band analysis of speech signals, which effectively carries both speaker and phone-unit-related information. A nonlinear feature extractor using multilayer perceptron (MLP) is designed to model the critical band information. Further, a neuro-fuzzy classifier (NFC) is configured to classify feature vectors into different dialectal classes to discriminate between finer variations. The objective is to investigate perceptually oriented information obtained from all critical bands to distinguish dialectal speech and the applicability of NFC for such problems. Experimental results are shown in terms of classification accuracy of four dialects of Assamese language, mostly spoken in Northeast India. A few baseline systems are developed using PLP and MFCC features along with a Gaussian Mixture Model (GMM)-based classifier. Experimental results prove the strength of the MLP-based nonlinear mapping of critical band information for dialect discrimination compared to the PLP-based autoregressive approximation and MFCC-based cepstral domain version of critical band energy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call