Modern speech recognition systems are powerful, but it is not simple to understand the mechanisms by which they model speech, or precisely how they make use of acoustic cues. Moreover, a system for the explicit detection of individual acoustic cues may be useful not only for speech recognition but for understanding the principles that govern sub-phonemic patterns of surface-phonetic variation, and how specific cue patterns may reflect speech disorders. This report describes a transparent, acoustic cue-based nasalization detection module, that can be used to find not only nasal consonants but also nasalization wherever it appears, e.g., within a vocalic region. We lay out a framework for extracting measurements that are key to detecting nasalization, in particular, the nasality-related spectral peaks P0 and P1, and confirm the utility of these measurements for nasality detection in spoken words. In particular, we present a pre-processing and cue-value extraction framework, and propose a Gaussian Mixture Model-based approach to detect the regions of nasalization in a speech signal. This work on a detection module for nasalization is part of a larger effort to develop a speech recognition system that is based on landmarks and other acoustic cues.
Read full abstract