Abstract
Voice disability is a barrier to effective communication. Around 1.2% of the World's population is facing some form of voice disability. Surgical procedures namely laryngoscopy, laryngeal electromyography, and stroboscopy are used for voice disability diagnosis. Researchers and practitioners have been working to find alternatives to these surgical procedures. Voice sample based diagnosis is one of them. The major steps followed by these works are (a) to extract voice features from voice samples and (b) to discriminate pathological voices from normal voices by using a classifier algorithm. However, there is no consensus about the voice feature and the classifier algorithm that can provide the best accuracy in screening voice disability. Moreover, some of the works use multiple voice features and multiple classifiers to ensure high reliability. In this paper, we address these issues. The motivation of the work is to address the need for non-invasive signal processing techniques to detect voice disability in the general population. This paper conducts a survey related to voice disability detection methods. The paper contains two main parts. In the first part, we present background information including causes of voice disability, current procedures and practices, voice features, and classifiers. In the second part, we present a comprehensive survey work on voice disability detection algorithms. The issues and challenges related to the selection of voice feature and classifier algorithms have been addressed at the end of this paper.
Highlights
Voice is a primitive natural tool for communication exercised by humans
The RASTA-perceptual linear prediction (PLP) is computed in the following steps: (a) compute the critical-band power spectrum, (b) transform spectral amplitude through a compressing static nonlinear transformation, (c) filter the time trajectory for each transformed spectral component, (d) transform the filtered speech representation through expanding static nonlinear transformation, (e) multiply by equal loudness curve and raise to the power 0.33 to simulate the power of law for hearing, (f) compute all-pole model of the resulting spectrum, following the conventional PLP technique
The results show that the Dense Net Recurrent Neural Network (DNRNN) algorithm achieves an accuracy of 71%, Recurrent Neural Network (RNN) achieves an accuracy of 30% and a random forecast approach achieves an accuracy of 68%
Summary
Voice is a primitive natural tool for communication exercised by humans. Voice communication used to be an integral part of our personal and professional life. We maintain a steady flow of air by controlling the muscles around the rib cage depending on the length of a sentence or phrase This action causes air to rush in through vocal trachea to the epiglottis. Vocal folds can be in two states namely unvoiced and voiced. While under voiced condition (i.e., during the generation of a vowel), the vocal folds come closer, become more tensed, and partially close the glottis. There are many issues and challenges related to voice signal based pathology detection techniques. The issues and challenges related to voice pathology detection are addressed in section VII and the paper is concluded with section VIII. A list of acronyms used throughout this paper is provided in the Appendix
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.