Abstract

The identification of the speakers from the continuous speech requires manifold algorithms and mathematical formulation at each integrated work stage. In this paper, a multi-feature adaptive hybrid model is provided to process the real time uninterrupted speech which provide 98.8% accuracy rate. To remove the speech impurities, wavelet decomposition and GMM are applied in an integrated form. The filtered speech is breakdown to smaller speech segments by applying a multi-phase clipper. Then frequency and time gap analysis are implied to recognize the silence and speech signal sections. The speech localization without content loss is obtained by combining the mutual tracing, dictionary tracing and the channel compensation method. The extracted independent speech segments are processed to generate the feature vector. The acoustic, statistical, reference and block adaptive features are combined to generate the vast adaptive feature set. I-vector scoring, acoustic scoring, likelihood measure and MFCC are applied sequentially to generate the feature pool. The extracted feature pool is processed under probabilistic deep neural network to classify and validate the speakers. The analysis is applied to CSTR VCTK Corpus speech datasets. The comparative evaluation using various methods is made. The evaluation results signify that the proposed model has improved the accuracy of speaker classification significantly.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.