Abstract
A method is presented for combining the feature extraction power of neural networks with model based dimensionality reduction to produce linguistically motivated low dimensional measurements of sounds. This method works by first training a convolutional neural network (CNN) to predict linguistically relevant category labels from the spectrograms of sounds. Then, idealized models of these categories are defined as probability distributions in a low dimensional measurement space with locations chosen to reproduce, as far as possible, the perceptual characteristics of the CNN. To measure a sound, the point is found in the measurement space for which the posterior probability distribution over categories in the idealized model most closely matches the category probabilities output by the CNN for that sound. In this way, the feature learning power of the CNN is used to produce low dimensional measurements. This method is demonstrated using monophthongal vowel categories to train this CNN and produce measurements in two dimensions. It is also shown that the perceptual characteristics of this CNN are similar to those of human listeners.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.