Monolingual Speech Research Articles

Emotion recognition plays an important role in human-computer interaction. Previously and currently, many studies focused on speech emotion recognition using several classifiers and feature extraction methods. The majority of such studies, however, address the problem of speech emotion recognition considering emotions solely from the perspective of a single language. In contrast, the current study extends monolingual speech emotion recognition to also cover the case of emotions expressed in several languages that are simultaneously recognized by a complete system. To address this issue, a method, which provides an effective and powerful solution to bilingual speech emotion recognition, is proposed and evaluated. The proposed method is based on a two-pass classification scheme consisting of spoken language identification and speech emotion recognition. In the first pass, the language spoken is identified; in the second pass, emotion recognition is conducted using the emotion models of the language identified. Based on deep learning and the i-vector paradigm, bilingual emotion recognition experiments have been conducted using the state-of-the-art English IEMOCAP (four emotions) and German FAU Aibo (five emotions) corpora. Two classifiers along with i-vector features were used and compared, namely, fully connected deep neural networks (DNN) and convolutional neural networks (CNN). In the case of DNN, 64.0% and 61.14% unweighted average recalls (UARs) were obtained using the IEMOCAP and FAU Aibo corpora, respectively. When using CNN, 62.0% and 59.8% UARs were achieved in the case of the IEMOCAP and FAU Aibo corpora, respectively. These results are very promising, and superior to those obtained in similar studies on multilingual or even monolingual speech emotion recognition. Furthermore, an additional baseline approach for bilingual speech emotion recognition was implemented and evaluated. In the baseline approach, six common emotions were considered, and bilingual emotion models were created, trained on data from the two languages. In this case, 51.2% and 51.5% UARs for six emotions were obtained using DNN and CNN, respectively. The results using the baseline method were reasonable and promising, showing the effectiveness of using i-vectors and deep learning in bilingual speech emotion recognition. On the other hand, the proposed two-pass method based on language identification showed significantly superior performance. Furthermore, the current study was extended to also deal with multilingual speech emotion recognition using corpora collected under similar conditions. Specifically, the English IEMOCAP, the German Emo-DB, and a Japanese corpus were used to recognize four emotions based on the proposed two-pass method. The results obtained were very promising, and the differences in UAR were not statistically significant compared to the monolingual classifiers.

Read full abstract

Code-switching is the phenomenon whereby multilingual speakers spontaneously alternate between more than one language during discourse and is widespread in multilingual societies. Current state-of-the-art automatic speech recognition (ASR) systems are optimised for monolingual speech, but performance degrades severely when presented with multiple languages. We address ASR of speech containing switches between English and four South African Bantu languages. No comparable study on code-switched speech for these languages has been conducted before and consequently no directly applicable benchmarks exist. A new and unique corpus containing 14.3 hours of spontaneous speech extracted from South African soap operas was used to perform our study. The varied nature of the code-switching in this data presents many challenges to ASR. We focus specifically on how the language model can be improved to better model bilingual language switches for English-isiZulu, English-isiXhosa, English-Setswana and English-Sesotho. Code-switching examples in the corpus transcriptions were extremely sparse, with the majority of code-switched bigrams occurring only once. Furthermore, differences in language typology between English and the Bantu languages and among the Bantu languages themselves contribute further challenges. We propose a new method using word embeddings trained on text data that is both out-of-domain and monolingual for the synthesis of artificial bilingual code-switched bigrams to augment the sparse language modelling training data. This technique has the particular advantage of not requiring any additional training data that includes code-switching. We show that the proposed approach is able to synthesise valid code-switched bigrams not seen in the training set. We also show that, by augmenting the training set with these bigrams, we are able to achieve notable reductions for all language pairs in the overall perplexity and particularly substantial reductions in the perplexity calculated across a language switch boundary (between 5 and 31%). We demonstrate that the proposed approach is able to reduce the unseen code-switched bigram types in the test sets by up to 20.5%. Finally, we show that the augmented language models achieve reductions in the word error rate for three of the four language pairs considered. The gains were larger for language pairs with disjunctive orthography than for those with conjunctive orthography. We conclude that the augmentation of language model training data with code-switched bigrams synthesised using word embeddings trained on out-of-domain monolingual text is a viable means of improving the performance of ASR for code-switched speech.

Read full abstract

Monolingual Speech Research Articles

Articles published on Monolingual Speech

Exploration on the Influence of Bilingualism on Language Production

Mastering the ways of indicating transport when mastering Russian as a native and as a foreign language

The role of INFL in code-switching: a study of a Papiamento heritage community in the Netherlands

Проект письменности для исчезающего бесписьменного алюторского языка

ПУТИ И СПОСОБЫ СОХРАНЕНИЯ БУРЯТСКОГО ЯЗЫКА В ДЕТСКОЙ СРЕДЕ

A Portrait of Lexical Knowledge among Adult Hebrew Heritage Speakers Dominant in American English: Evidence from Naming and Narrative Tasks

The Effect of Language Contact on /tʃ/ Deaffrication in Spanish from the US–Mexico Borderland

Overcoming Aggressive Monolingualism: Prejudices and Linguistic Diversity in Russian Megalopolises

A comprehensive study on bilingual and multilingual speech emotion recognition using a two-pass classification scheme.

Synthesised bigrams using word embeddings for code-switched ASR of four South African language pairs

Information-theoretic variables in Spanish-English bilingual speech

The cognitive load of interpreters in the European Parliament

The influence of language background and exposure on phonetic accommodation

Introduction to the special issue: Monolingual and bilingual speech acquisition across languages

Translating with an Injured Brain: Neurolinguistic Aspects of Translation as Revealed by Bilinguals with Cerebral Lesions

O ROMANCE PLURILINGUE OU COMO A LÍNGUA INCORPORA A CULTURA DO OUTRO

A bisserl (‘little’) English, a bisserl Austrian, a bisserl Jewish, a bisserl female: Minority identity construction on a bilingual collaborative floor

Utjecaj kognitivne složenosti zadatka na samoispravljanja

FRAMING FOREIGN LANGUAGE EDUCATION IN THE UNITED STATES: THE CASE OF GERMAN

From the Editor

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Monolingual Speech Research Articles

Articles published on Monolingual Speech

Exploration on the Influence of Bilingualism on Language Production

Mastering the ways of indicating transport when mastering Russian as a native and as a foreign language

The role of INFL in code-switching: a study of a Papiamento heritage community in the Netherlands

Проект письменности для исчезающего бесписьменного алюторского языка

ПУТИ И СПОСОБЫ СОХРАНЕНИЯ БУРЯТСКОГО ЯЗЫКА В ДЕТСКОЙ СРЕДЕ

A Portrait of Lexical Knowledge among Adult Hebrew Heritage Speakers Dominant in American English: Evidence from Naming and Narrative Tasks

The Effect of Language Contact on /tʃ/ Deaffrication in Spanish from the US–Mexico Borderland

Overcoming Aggressive Monolingualism: Prejudices and Linguistic Diversity in Russian Megalopolises

A comprehensive study on bilingual and multilingual speech emotion recognition using a two-pass classification scheme.

Synthesised bigrams using word embeddings for code-switched ASR of four South African language pairs

Information-theoretic variables in Spanish-English bilingual speech

The cognitive load of interpreters in the European Parliament

The influence of language background and exposure on phonetic accommodation

Introduction to the special issue: Monolingual and bilingual speech acquisition across languages

Translating with an Injured Brain: Neurolinguistic Aspects of Translation as Revealed by Bilinguals with Cerebral Lesions

O ROMANCE PLURILINGUE OU COMO A LÍNGUA INCORPORA A CULTURA DO OUTRO

A bisserl (‘little’) English, a bisserl Austrian, a bisserl Jewish, a bisserl female: Minority identity construction on a bilingual collaborative floor

Utjecaj kognitivne složenosti zadatka na samoispravljanja

FRAMING FOREIGN LANGUAGE EDUCATION IN THE UNITED STATES: THE CASE OF GERMAN

From the Editor