Abstract
recognition of disorder people is a difficult task due to the lack of motor-control of the speech articulators. Multimodal speech recognition can be used to enhance the robustness of disordered speech. This paper introduces an automatic speech recognition system for people with dysarthria speech disorder based on both speech and visual components. The Mel-Frequency Cepestral Coefficients (MFCC) is used as features representing the acoustic speech signal. For the visual counterpart, the Discrete Cosine Transform (DCT) Coefficients are extracted from the speaker's mouth region. Face and mouth regions are detected using the Viola-Jones algorithm. The acoustic and visual input features are then concatenated on one feature vector. Then, the Hidden Markov Model (HMM) classifier is applied on the combined feature vector of acoustic and visual components. The system is tested on isolated English words spoken by disorder speakers from UA-Speech data. Results of the proposed system indicate that visual features are highly effective and can improve the accuracy to reach 7.91% for speaker dependent experiments and 3% for speaker independent experiments.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.