Abstract
In this study, we attempt a simultaneous recognition method of phone and speaker using a single energy-based model, a three-way restricted Boltzmann machine (3WRBM). The proposed model is a probabilistic model that includes three variables: acoustic features, latent phonetic features, and speaker-identity features. The model is trained so that it automatically captures the intensity of relationships among the three variables. Once the training is done, we can apply the model to many speech signal processing tasks because it has an ability to separate phoneme and speaker-related information from the observed speech, and generate a speech signal from the phoneme and speaker-related information on the contrary. Simultaneous phone and speaker recognition is achieved by estimating the latent phonetic features and the speaker-identity features given the input signal. In our experiments, we discuss the effectiveness of the mode lin a speaker recognition and a speech (continuous phone) recognition tasks.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.