Abstract
In this paper, we analyse segmented speech phonemes with Convolutional filters, after embedding them in Reconstructed Phase Space (RPS). These feature extracting Convolutional filters are trained on the embedded speech data from scratch and are also fine-tuned from networks trained with other data. Reconstruction of Phase Space portrays the dynamics of an observed system as a geometric representation. We present a study highlighting the discriminative capacity of the features extracted through Convolutional Neural Network (CNN) from the textural pattern and shape of this geometric representation. CNNs are heavily used in image-related tasks, but have not seen application on phase space portraits, possibly due to the higher dimensionality of the embedding. However, we find that the application of CNN on restricted bi-dimensional RPS, characterizes the space well than prior methods on high dimensional embeddings. We show experimental results supporting the use of RPS with CNN (RPS-CNN) for phoneme classification. The results affirm that essential signal characteristics are automatically quantified from the phase portraits of speech and can be used in place of conventional techniques involving frequency domain transformations.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.