Abstract
The objective of this research was to develop deep learning classifiers and various parameters that provide an accurate and objective system for classifying elderly and young voice signals. This work focused on deep learning methods, such as feedforward neural network (FNN) and convolutional neural network (CNN), for the detection of elderly voice signals using mel-frequency cepstral coefficients (MFCCs) and linear prediction cepstrum coefficients (LPCCs), skewness, as well as kurtosis parameters. In total, 126 subjects (63 elderly and 63 young) were obtained from the Saarbruecken voice database. The highest performance of 93.75% appeared when the skewness was added to the MFCC and MFCC delta parameters, although the fusion of the skewness and kurtosis parameters had a positive effect on the overall accuracy of the classification. The results of this study also revealed that the performance of FNN was higher than that of CNN. Most parameters estimated from male data samples demonstrated good performance in terms of gender. Rather than using mixed female and male data, this work recommends the development of separate systems that represent the best performance through each optimized parameter using data from independent male and female samples.
Highlights
The human voice represents a complex biological signal resulting from the dynamic interaction between adduction/vibration of the vocal folds and pulmonary air emission and flow through the resonant structures [1]
In order to create a system for recognizing the voice of the elderly, it is necessary to understand the characteristics of changes in vocal cord tissue due to anatomical or physiological aging [10], and various welfare systems using only the voice database of the elderly should be implemented
This work focused on deep learning methods, such as feedforward neural network (FNN) and convolutional neural network (CNN), for the detection of elderly voice signals using mel-frequency cepstral coefficients (MFCCs) and linear prediction cepstrum coefficients (LPCCs), skewness, and kurtosis parameters
Summary
The human voice represents a complex biological signal resulting from the dynamic interaction between adduction/vibration of the vocal folds and pulmonary air emission and flow through the resonant structures [1]. Physiologic aging leads to specific changes in the anatomy and physiology of all structures involved in the production and modulation of the human voice [2,3,4]. The aging of laryngeal tissue changes the movement of the vocal cords, their vibration, and their opening and closing processes [5]. Voice characteristics are measured by the frequency of vocal cord oscillations per second, that is, the fundamental frequency (F0), jitter, shimmer, excitation source component, etc. In order to create a system for recognizing the voice of the elderly, it is necessary to understand the characteristics of changes in vocal cord tissue due to anatomical or physiological aging [10], and various welfare systems using only the voice database of the elderly should be implemented
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.