Multiparty Conversations Research Articles

It is possible to use lexical information extracted from speech transcripts for speaker identification (SID), either on its own or to improve the performance of standard cepstral-based SID systems upon fusion. This was established before typically using isolated speech from single speakers (NIST SRE corpora, parliamentary speeches). On the contrary, this work applies lexical approaches for SID on a different type of data. It uses the REPERE corpus consisting of unsegmented multiparty conversations, mostly debates, discussions and Q&A sessions from TV shows. It is hypothesized that people give out clues to their identity when speaking in such settings which this work aims to exploit. The impact on SID performance of the diarization front-end required to pre-process the unsegmented data is also measured. Four lexical SID approaches are studied in this work, including TFIDF, BM25 and LDA-based topic modeling. Results are analysed in terms of TV shows and speaker roles. Lexical approaches achieve low error rates for certain speaker roles such as anchors and journalists, sometimes lower than a standard cepstral-based Gaussian Supervector - Support Vector Machine (GSV-SVM) system. Also, in certain cases, the lexical system shows modest improvement over the cepstral-based system performance using score-level sum fusion. To highlight the potential of using lexical information not just to improve upon cepstral-based SID systems but as an independent approach in its own right, initial studies on crossmedia SID is briefly reported. Instead of using speech data as all cepstral systems require, this approach uses Wikipedia texts to train lexical speaker models which are then tested on speech transcripts to identify speakers.

Read full abstract

It is essential for the advancement of human-centered multimodal interfaces to be able to infer the current user's state or communication state. In order to enable a system to do that, the recognition and interpretation of multimodal social signals (i.e., paralinguistic and nonverbal behavior) in real-time applications is required. Since we believe that laughs are one of the most important and widely understood social nonverbal signals indicating affect and discourse quality, we focus in this work on the detection of laughter in natural multiparty discourses. The conversations are recorded in a natural environment without any specific constraint on the discourses using unobtrusive recording devices. This setup ensures natural and unbiased behavior, which is one of the main foci of this work. To compare results of methods, namely Gaussian Mixture Model (GMM) supervectors as input to a Support Vector Machine (SVM), so-called Echo State Networks (ESN), and a Hidden Markov Model (HMM) approach, are utilized in online and offline detection experiments. The SVM approach proves very accurate in the offline classification task, but is outperformed by the ESN and HMM approach in the online detection (F 1 scores: GMM SVM 0.45, ESN 0.63, HMM 0.72). Further, we were able to utilize the proposed HMM approach in a cross-corpus experiment without any retraining with respectable generalization capability (F 1 score: 0.49). The results and possible reasons for these outcomes are shown and discussed in the article. The proposed methods may be directly utilized in practical tasks such as the labeling or the online detection of laughter in conversational data and affect-aware applications.

Read full abstract

Multiparty Conversations Research Articles

Related Topics

Articles published on Multiparty Conversations

Lexical speaker identification in TV shows

Deficient gaze pattern during virtual multiparty conversation in patients with schizophrenia

Sociable Spotlight: The Impact of Interactive Artifacts to Ground the Social Interactions

Adjusting conceptual pacts in three-party conversation.

A Multiparty Conversation System with an Addressee Identification Mechanism based on Nonverbal Information

Modeling topic control to detect influence in conversations using nonparametric topic models

Gaze behavior of pre-adolescent children afflicted with Asperger Syndrome

Towards a Semantic-Based Approach for Affect and Metaphor Detection

Wordless Sounds: Robust Speaker Diarization Using Privacy-Preserving Audio Representations

Identifying the Addressee using Head Orientation and Speech Information in Multiparty Human-Agent Conversations

Resources for turn competition in overlapping talk

MM-Space: Recreating Multiparty Conversation Space by Using Dynamic Displays

Production, circulation and deconstruction of gender norms in LGBTQ speech practices

Conversational learning integration in technology enhanced classrooms

Automatic Role Recognition in Multiparty Conversations: An Approach Based on Turn Organization, Prosody, and Conditional Random Fields

Spotting laughter in natural multiparty conversations

Across languages and cultures: Brokering problems of understanding in conversational repair

Privacy-Sensitive Audio Features for Speech/Nonspeech Detection

Managing multiple actions through multimodality: Doctors' involvement in interpreter-mediated interactions

A-STAR: Toward translating Asian spoken languages

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Multiparty Conversations Research Articles

Related Topics

Articles published on Multiparty Conversations

Lexical speaker identification in TV shows

Deficient gaze pattern during virtual multiparty conversation in patients with schizophrenia

Sociable Spotlight: The Impact of Interactive Artifacts to Ground the Social Interactions

Adjusting conceptual pacts in three-party conversation.

A Multiparty Conversation System with an Addressee Identification Mechanism based on Nonverbal Information

Modeling topic control to detect influence in conversations using nonparametric topic models

Gaze behavior of pre-adolescent children afflicted with Asperger Syndrome

Towards a Semantic-Based Approach for Affect and Metaphor Detection

Wordless Sounds: Robust Speaker Diarization Using Privacy-Preserving Audio Representations

Identifying the Addressee using Head Orientation and Speech Information in Multiparty Human-Agent Conversations

Resources for turn competition in overlapping talk

MM-Space: Recreating Multiparty Conversation Space by Using Dynamic Displays

Production, circulation and deconstruction of gender norms in LGBTQ speech practices

Conversational learning integration in technology enhanced classrooms

Automatic Role Recognition in Multiparty Conversations: An Approach Based on Turn Organization, Prosody, and Conditional Random Fields

Spotting laughter in natural multiparty conversations

Across languages and cultures: Brokering problems of understanding in conversational repair

Privacy-Sensitive Audio Features for Speech/Nonspeech Detection

Managing multiple actions through multimodality: Doctors' involvement in interpreter-mediated interactions

A-STAR: Toward translating Asian spoken languages