Abstract
Recognition of voiced speech phonemes is addressed in this paper using features extracted from the bispectrum of the speech signal. Voiced speech is modeled as a superposition of coupled harmonics, located at frequencies that are multiples of the pitch and modulated by the vocal tract. For this type of signal, nonzero bispectral values are shown to be guaranteed by the estimation procedure employed. The vocal tract frequency response is reconstructed from the bispectrum on a set of frequency points that are multiples of the pitch. An AR model is next fitted on this transfer function. The AR coefficients are used as the feature vector for the subsequent classification step. Any finite dimension vector classifier can be employed at this point. Experiments using the LVQ neural classifier give satisfactory classification scores on real speech data, extracted from the DARPA/TIMIT speech corpus.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.