Multi-speaker Conversations Research Articles

Daily conversations contain rich emotional information, and identifying this emotional information has become a hot task in the field of natural language processing. The traditional dialogue sentiment analysis method studies one-to-one dialogues and cannot be effectively applied to multi-speaker dialogues. This paper focuses on the relationship between participants in a multi-speaker conversation and analyzes the influence of each speaker on the emotion of the whole conversation. We summarize the challenges of emotion recognition work in multi-speaker dialogue, focusing on the context-topic switching problem caused by multi-speaker dialogue due to its free flow of topics. For this challenge, this paper proposes a graph network that combines syntactic structure and topic information. A syntax module is designed to convert sentences into graphs, using edges to represent dependencies between words, solving the colloquial problem of daily conversations. We use graph convolutional networks to extract the implicit meaning of discourse. In addition, we focus on the impact of topic information on sentiment, so we design a topic module to optimize the topic extraction and classification of sentences by VAE. Then, we use the combination of attention mechanism and syntactic structure to strengthen the model’s ability to analyze sentences. In addition, the topic segmentation technology is adopted to solve the long-term dependencies problem, and a heterogeneous graph is used to model the dialogue. The nodes of the graph combine speaker information and utterance information. Aiming at the interaction relationship between the subject and the object of the dialogue, different edge types are used to represent different interaction relationships, and different weights are assigned to them. The experimental results of our work on multiple public datasets show that the new model outperforms several other alternative methods in sentiment label classification results. In the multi-person dialogue dataset, the classification accuracy is increased by more than 4%, which verifies the effectiveness of constructing heterogeneous dialogue graphs.

Read full abstract

Speaker diarization can be considered to be one of the complex problems in speaker recognition. A reliable diarization system should be able to accurately determine the variable length utterances which a speaker contributes to multi-speaker conversations. This is a difficult problem since text-independent speaker identification and verification is yet to be improved for it to be applied reliably. While efficient speaker modelling is important for diarization, the acoustical representation of speech is the basic entity that signifies a speaker. This representation should be outstanding enough to prevent a speaker’s utterances from being lost in the acoustical congestion that is imposed by the rest of the talkers.For this purpose, it is proposed here, for the case of multiple-microphone diarization, multiple speech signals are used in the acoustic feature extraction instead of combining the signals beforehand. The reason is to make an optimal use of those signals in order to enrich the quality of the acoustical representation of the speaker. To this end, and since not all microphone signals (channels) may be desirable, two selection approaches are proposed in this work. These are, a best quality channel selection method and a novel approach for diverse channel selection. Furthermore, a novel method is proposed which retains the speech spectrum from selected least reverberated subbands of the available channels’ spectrums. A new model, referred to here as Averaged Joint Gradient (AJG), is introduced for this purpose. The proposed approach reduces the Diarization Error Rate (DER) in both of the diarization systems used in the evaluations. The first system is based on binary keys and achieves a maximum relative reduction in DER of 14%. The second one is a Gaussian Mixture Model-Bayesian Information Criterion (GMM-BIC) based system which achieves a maximum relative reduction in DER of 20%.

Read full abstract

Multi-speaker Conversations Research Articles

Related Topics

Articles published on Multi-speaker Conversations

Integration of audio-visual information for multi-speaker multimedia speaker recognition

Joint Syntax-Enhanced and Topic-Driven Graph Networks for Emotion Recognition in Multi-Speaker Conversations

Multi-Target Extractor and Detector for Unknown-Number Speaker Diarization

Channel and channel subband selection for speaker diarization

Fearless steps, NASA’s first heroes: Conversational speech analysis of the Apollo-11 mission control personnel

Airborne vs. radio-transmitted vocalizations in two primates: a technical report

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Multi-speaker Conversations Research Articles

Related Topics

Articles published on Multi-speaker Conversations

Integration of audio-visual information for multi-speaker multimedia speaker recognition

Joint Syntax-Enhanced and Topic-Driven Graph Networks for Emotion Recognition in Multi-Speaker Conversations

Multi-Target Extractor and Detector for Unknown-Number Speaker Diarization

Channel and channel subband selection for speaker diarization

Fearless steps, NASA’s first heroes: Conversational speech analysis of the Apollo-11 mission control personnel

Airborne vs. radio-transmitted vocalizations in two primates: a technical report