Automatic Speech Recognition (ASR) technology is designed to provide more human to machine communication options. Emergent research has observed variant performance of ASR for African American English (AAE) and other minority dialects. Word prediction was observed to be better for AAE compared to majority dialects. The opposite was true for word identification accuracy. The researchers hypothesize the higher word error rate for AAE is related to phonetic factors of vowel duration, consonant production, rhythm, pitch, and syllable accent. This work evaluates that hypothesis. Recordings of two AAE and two White AE speaking women from North Carolina reading Comma Gets a Cure were submitted for transcription to the Microsoft ASR program. Several common transcription errors were noted across speaker groups with a greater variety of errors noted for the AAE women. Acoustic analyses of vowel duration, and the spectral acoustics of vowel and consonant production by talker and group will be completed. The results of this analysis will be useful to describing the specific aspects of AAE speech that may perturb the tested ASR. These data provide insight into the speech tasks and the sub and supra-segmental acoustic data that should be included in future ASR programming iterations.
Read full abstract