Incorporating vocalic segment memories in automatic dialect identification

David M Rojas

doi:10.1121/1.2942835

Abstract

Systematic pronunciation differences among speakers of regional varieties of U.S. English are recognizable to other native speakers to varying degrees. This has often been demonstrated through experiments wherein listeners were asked to match a talker to his or her dialect region. Machines have also been able to identify the regional origin of a speaker to some degree, although attempts to this end have typically not been as successful as efforts to automatically identify the language of a speaker. In order to refine the dialect discrimination ability of a machine, this paper draws methodological inspiration from the area of musical artist classification, and from linguistic notions that vowels contribute more heavily than consonants to regional differences. Using this insight, an automatic dialect identification system is developed that first recognizes the more vowel-like slices of the signal, and then updates a vocalic segment memory component with Mel-frequency cepstral coefficient, formant, and pitch information from the current frame. Besides providing a means to analyze MFCC and formant trajectories, the segment memory enriches the representation of vocalic events by allowing the system to explicitly model prosodic aspects such as duration and tilt.

Full Text