Abstract

Phoneme classification is a classification sub-task of automatic speech recognition (ASR), which is essential in order to achieve good speech recognition accuracy. However, unlike most classification tasks, besides finding the correct class, providing good posterior scores is also an important requirement of it. Partly because of this, formerly Gaussian Mixture Models, while recently Artificial Neural Networks (ANNs) are used in this task, while other common machine learning methods like Support Vector Machines and AdaBoost.MH are applied only rarely. In a previous study, we showed that AdaBoost.MH can match the performance of ANNs in terms of classification accuracy, but lags behind it when utilizing its output in the speech recognition process. This is in part due to the imprecise posterior scores that AdaBoost.MH produces, which is a well-known weakness of this method. To improve the quality of posterior scores produced, it is common to perform some kind of posterior calibration. In this study, we test several posterior calibration techniques in order to improve the overall performance of AdaBoost.MH. We found that posterior calibration is a good way to improve ASR accuracy, especially when we integrate the speech recognition process into the calibration workflow.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.