Fusion of multi-stream speech features for dialect classification

Shweta Sinha,Aruna Jain,S S Agrawal

doi:10.1007/s40012-015-0063-y

Abstract

Current research in the area of voice recognition has entered a new stage. It does not only concentrate on the correct evaluation of linguistic information embodied in the speech signal, it also works towards identification of variations naturally present in speech. Undoubtedly, the focus is to enhance the accuracy and precision of the developed technique. Speaker’s accent due to his native dialect is one of the major source of variability. Prior knowledge of the spoken dialect will help in the creation of multi-model speech recognition system and can enhance its recognition performance. This paper focusses on applying some of the established dialect identification techniques to identify speaker’s spoken dialect among dialects of Hindi. Fusion of multiple streams obtained as a combination of phonetic and prosodic features is implemented to exploit the acoustic information. The work presented here also exploits the ability of AANN to capture the distribution of data points in a reduced number and further to classify them into groups. System performance for different level of fusion is recorded for Hindi dialect classification. It is observed that Duration as prosodic feature is an important cue for automatic dialect identification systems.

Full Text