Abstract
Non-invasive identification of abnormal voice using feature descriptors and machine learning classifiers has been the preference of many literatures. Using feature descriptors and time-frequency image with deep learning is a better alternative. Most voice pathology deep learning frameworks are based on binary classification model. Implementing a hardware system requires a network which can recognise the exact pathology. A thorough investigation of time-frequency analysis using a sophisticated deep learning system is required. Current work explores a non-invasive, robust and computationally non-intensive architecture for detecting multiclass laryngeal pathology. For applying in a realistic environment, the capability of a fully-connected network and a fully convolutional deep-learning voice denoiser network is initially investigated. Denoised training samples are used to create three different time-frequency image corpuses. These multivariate image datasets will train three improved variants of some state-of-the-art convolutional neural network models which use 3D convolution kernel. A “group decision analogy” technique is employed for training and attaining the global maxima of the current classification problem. The concept of group decision analogy stems from well-known clustering and swarm optimization algorithms. It uses three stages to optimise the predicted score. In the first level, these enhanced deep-learning models are trained using the three datasets to recognise healthy, hyperkinetic dysphonia, hypokinetic dysphonia, and reflux laryngitis classes. The prediction then undergoes a second and third stage. A score of 80.59% is obtained before using “group decision analogy” which is raised to 97.7% finally. The hypokinetic dysphonia and reflux laryngitis achieves 100% classification.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.