Abstract

Recognition of the speaker's emotions is an important but challenging component of Human-Computer Interaction (HCI). The need for the recognition of the speaker's emotions is also increasing related to the need for digitizing the company's operational processes related to the implementation of industry 4.0. The use of Deep Learning methods is currently increasing, especially for processing unstructured data such as data from voice signals. This study tries to apply the Deep Learning method to classify the speaker's emotions using an open dataset from SAVEE which contains seven classes of voice emotions in English. The dataset will be trained using the CNN model. The final accuracy of the model is 88% on the training data and 52% on the test data, which means the model is overfitting. This is due to the imbalance of emotion classes in the dataset, which makes the model tend to predict classes with more labels. In addition, the lack of heterogeneity of the dataset makes the character of the emotion class more different from the others so that it can reduce the bias in the model so as not to overfit the model. Further development of this research can be done, such as over-sampling the existing dataset by adding other data sources, then performing data augmentation to get the data character of each emotion class and setting hyperparameter values ​​to get better accuracy values.

Highlights

  • Recognition of the speaker's emotions is an important but challenging component of Human-Computer Interaction (HCI)

  • The need for the recognition of the speaker's emotions is also increasing related to the need for digitizing the company's operational processes related to the implementation

  • apply the Deep Learning method to classify the speaker's emotions using an open dataset from SAVEE

Read more

Summary

Pendahuluan

Pengenalan emosi pembicara adalah tindakan mencoba mengenali emosi manusia dan keadaan afektif dari ucapan. Ini juga merupakan fenomena yang digunakan hewan seperti anjing dan kuda untuk dapat memahami emosi manusia. Menentukan keadaan emosional manusia adalah hal yang istimewa dan dapat digunakan sebagai standar untuk model pengenalan emosi apa pun [11]. Sistem pengenalan emosi berdasarkan pidato digital terdiri dari tiga komponen mendasar: sinyal preprocessing, ekstraksi fitur, dan klasifikasi [19]. Ekstraksi fitur digunakan untuk mengidentifikasi fitur relevan yang tersedia dalam sinyal. Gambar 1 menggambarkan sistem sederhana yang digunakan untuk pengenalan emosi berbasis ucapan. CNN menggabungkan fitur yang dipelajari dengan data input, dan menggunakan lapisan konvolusi 2D, membuat arsitektur ini cocok untuk memproses data 2D, seperti gambar. Kami mencoba menggunakan metode Deep Learning dengan algoritma CNN (Convolutional Neural Networks) untuk mempelajari data sinyal suara agar mampu mengenali jenis emosi yang terdapat pada suara manusia.

Pengumpulan Dataset dan Preprocessing Data
Augmentasi Data
Pelatihan Model dan Evaluasi
Hasil dan Pembahasan
Findings
Kesimpulan
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call