Abstract

In this paper, a dialect identification system (DIS) is proposed by exploring the dialect specific prosodic features and cepstral coefficients from sentence-level utterances. Commonly, people belonging to a specific region follow a unique speaking style among them known as dialects. Sentence speech units are chosen for dialect identification since it is observed that a unique intonation and energy patterns are followed in sentences. Sentences are derived from a standard Intonational Variations in English (IViE) speech dataset. In this paper, pitch and energy contour are used to derive intonation and energy features respectively by using Legendre polynomial fit function along with five statistical features. Further, Mel frequency cepstral coefficients (MFCCs) are added to capture dialect specific spectral information. Extreme Gradient Boosting (XGB) ensemble method is employed for evaluation of the system under individual and combinations of features. Obtained results have indicated the influences of both prosodic and spectral features in recognition of dialects, also combined feature vectors have shown a better DIS performance of about 89.6%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call