Abstract
Interpreting a speech signal is quite challenging because it consists of different frequencies and features that vary according to emotions. Although different algorithms are being developed in the speech emotion recognition (SER) domain, the success rates vary according to the spoken languages, emotions, and databases. In this study, a new lightweight effective SER method has been developed that has low computational complexity. This method, called 1BTPDN, is applied on RAVDESS, EMO-DB, SAVEE, and EMOVO databases. First, low-pass filter coefficients are obtained by applying a one-dimensional discrete wavelet transform on the raw audio data. The features are extracted by applying textural analysis methods, a one-dimensional local binary pattern, and a one-dimensional local ternary pattern to each filter. Using neighborhood component analysis, the most dominant 1024 features are selected from 7680 features while the other features are discarded. These 1024 features are selected as the input of the classifier which is a third-degree polynomial kernel-based support vector machine. The success rates of the 1BTPDN reached 95.16%, 89.16%, 76.67%, and 74.31% in the RAVDESS, EMO-DB, SAVEE, and EMOVO databases, respectively. The recognition rates are higher compared to many textural, acoustic, and deep learning state-of-the-art SER methods.
Highlights
Speech processing methods are used in the domain of humancomputer interaction (HCI) such as security applications, computer education applications, vehicle card systems, automatic translation systems, call center applications, psychosis monitoring and diagnosis of neuropsychological disorders, voice message sorting, telecommunication, assistive technologies, and audio mining [1]
We propose a novel SER method called speech emotion recognition model based on multi-level local binary pattern and local ternary pattern, which has been abbreviated as 1BTPDN
A novel text-independent and speakerindependent, SER method, called 1BTPDN has been developed with a lightweight method that solves a nonpolynomial problem by extracting handcrafted features
Summary
Speech processing methods are used in the domain of humancomputer interaction (HCI) such as security applications, computer education applications, vehicle card systems, automatic translation systems, call center applications, psychosis monitoring and diagnosis of neuropsychological disorders, voice message sorting, telecommunication, assistive technologies, and audio mining [1]. It is used in digital forensics, games, robots, and the legal evaluation of an individual’s psychological integrity [2]. LBP and LTP are used in two dimensional (2D) images for texture segmentation and feature detection in image processing They have computational and programming simplicity, which make them utilizable for realtime applications [38]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.