Utterance Classification Research Articles

The subject of this study is the data and methods used in the task of automatic recognition of emotions in spoken speech. This task has gained great popularity recently, primarily due to the emergence of large datasets of labeled data and the development of machine learning models. The classification of speech utterances is usually based on 6 archetypal emotions: anger, fear, surprise, joy, disgust and sadness. Most modern classification methods are based on machine learning and transformer models using a self-learning approach, in particular, models such as Wav2vec 2.0, HuBERT and WavLM, which are considered in this paper. English and Russian datasets of emotional speech, in particular, the datasets Dusha and RESD, are analyzed as data. As a method, an experiment was conducted in the form of comparing the results of Wav2vec 2.0, HuBERT and WavLM models applied to the relatively recently collected Russian datasets of emotional speech Dusha and RESD. The main purpose of the work is to analyze the availability and applicability of available data and approaches to recognizing emotions in speech for the Russian language, for which relatively little research has been conducted up to this point. The best result was demonstrated by the WavLM model on the Dusha dataset - 0.8782 dataset according to the Accuracy metric. The WavLM model also received the best result on the RESD dataset, while preliminary training was conducted for it on the Dusha - 0.81 dataset using the Accuracy metric. High classification results, primarily due to the quality and size of the collected Dusha dataset, indicate the prospects for further development of this area for the Russian language.

Read full abstract

Current automatic writing feedback systems cannot distinguish between different discourse elements in students' writing. This is a problem because, without this ability, the guidance provided by these systems is too general for what students want to achieve on arrival. This is cause for concern because automated writing feedback systems are a great tool for combating student writing declines. According to the National Assessment of Educational Progress, less than 30 percent of high school graduates are gifted writers. If we can improve the automatic writing feedback system, we can improve the quality of student writing and stop the decline of skilled writers among students. Solutions to this problem have been proposed, the most popular being the fine-tuning of bidirectional encoder representations from Transformers models that recognize various utterance elements in student written assignments. However, these methods have their drawbacks. For example, these methods do not compare the strengths and weaknesses of different models, and these solutions encourage training models over sequences (sentences) rather than entire articles. In this article, I'm redesigning the Persuasive Essays for Rating, Selecting, and Understanding Argumentative and Discourse Elements corpus so that models can be trained for the entire article, and I've included Transformers, the Long Document Transformer's bidirectional encoder representation, and the Generative Improving a pre trained Transformer 2 model for utterance classification in the context of a named entity recognition token classification problem. Overall, the bi-directional encoder representation of the Transformers model railway using my sequence-merging preprocessing method outperforms the standard model by 17% and 41% in overall accuracy. I also found that the Long Document Transformer model performed the best in utterance classification with an overall f-1 score of 54%. However, the increase in validation loss from 0.54 to 0.79 indicates that the model is overfitting. Some improvements can still be made due to model overfittings, such as B. Implementation of early stopping techniques and further examples of rare utterance elements during training.

Read full abstract

Utterance Classification Research Articles

Related Topics

Articles published on Utterance Classification

Automatic classification of emotions in speech: methods and data

Illocutionary Act Analysis of Melati and Isabel Wijsen Speech at United Nations

A deep learning approach to dysarthric utterance classification with BiLSTM-GRU, speech cue filtering, and log mel spectrograms

ТАКСОНОМИЯ ПАРЕМИЙ В КИТАЙСКОМ ЯЗЫКОВЕДЕНИИ: СТРУКТУРНО-СЕМАНТИЧЕСКИЕ И ЛИНГВОПРАГМАТИЧЕСКИЕ АСПЕКТЫ

Pragmatic Semantics in Football Fans’ «Banner» Communication in the German Language

Multi-Modal Sarcasm Detection and Humor Classification in Code-Mixed Conversations

Identifying Discourse Elements in Writing by Longformer for NER Token Classification

Hate Speech in Internet Communication

Spoken Utterance Classification Task of Arabic Numerals and Selected Isolated Words

Classification of Utterances that Lead to Dialogue Breakdowns in Chat-oriented Dialogue Systems

Вiды сiнтаксiчна нячленных выказванняў нязгоды ў славянскiх мовах: семантычны аспект

Prosody-Based Measures for Automatic Severity Assessment of Dysarthric Speech

Recognition of speech emotion using custom 2D-convolution neural network deep learning algorithm

Sequential neural networks for noetic end-to-end response selection

Classification of humorous interactions with intelligent personal assistants

Observability of Inter-Organizational Crisis Management Capability

Native Language Identification in Very Short Utterances Using Bidirectional Long Short-Term Memory Network

A construção de uma escala sobre as concepções de deficiência: procedimentos metodológicos

In-Vehicle Voice Interface with Improved Utterance Classification Accuracy Using Off-the-Shelf Cloud Speech Recognizer

Robust Visual Lips Feature Extraction Method for Improved Visual Speech Recognition System

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Utterance Classification Research Articles

Related Topics

Articles published on Utterance Classification

Automatic classification of emotions in speech: methods and data

Illocutionary Act Analysis of Melati and Isabel Wijsen Speech at United Nations

A deep learning approach to dysarthric utterance classification with BiLSTM-GRU, speech cue filtering, and log mel spectrograms

ТАКСОНОМИЯ ПАРЕМИЙ В КИТАЙСКОМ ЯЗЫКОВЕДЕНИИ: СТРУКТУРНО-СЕМАНТИЧЕСКИЕ И ЛИНГВОПРАГМАТИЧЕСКИЕ АСПЕКТЫ

Pragmatic Semantics in Football Fans’ «Banner» Communication&#x0D; in the German Language

Multi-Modal Sarcasm Detection and Humor Classification in Code-Mixed Conversations

Identifying Discourse Elements in Writing by Longformer for NER Token Classification

Hate Speech in Internet Communication

Spoken Utterance Classification Task of Arabic Numerals and Selected Isolated Words

Classification of Utterances that Lead to Dialogue Breakdowns in Chat-oriented Dialogue Systems

Вiды сiнтаксiчна нячленных выказванняў нязгоды ў славянскiх мовах: семантычны аспект

Prosody-Based Measures for Automatic Severity Assessment of Dysarthric Speech

Recognition of speech emotion using custom 2D-convolution neural network deep learning algorithm

Sequential neural networks for noetic end-to-end response selection

Classification of humorous interactions with intelligent personal assistants

Observability of Inter-Organizational Crisis Management Capability

Native Language Identification in Very Short Utterances Using Bidirectional Long Short-Term Memory Network

A construção de uma escala sobre as concepções de deficiência: procedimentos metodológicos

In-Vehicle Voice Interface with Improved Utterance Classification Accuracy Using Off-the-Shelf Cloud Speech Recognizer

Robust Visual Lips Feature Extraction Method for Improved Visual Speech Recognition System

Pragmatic Semantics in Football Fans’ «Banner» Communication in the German Language