Multilingual Speech Research Articles

In this paper, a multilingual end-to-end framework, called ATCSpeechNet, is proposed to tackle the issue of translating communication speech into human-readable text in air traffic control (ATC) systems. In the proposed framework, we focus on integrating multilingual automatic speech recognition (ASR) into one model, in which an end-to-end paradigm is developed to convert speech waveforms into text directly, without any feature engineering or lexicon. To compensate the deficiency of handcrafted feature engineering caused by ATC challenges, including multilingual, multispeaker dialog and unstable speech rates, a speech representation learning (SRL) network is proposed to capture robust and discriminative speech representations from raw waves. The self-supervised training strategy is adopted to optimize the SRL network from unlabeled data, and to further predict the speech features, i.e., wave-to-feature. An end-to-end architecture is improved to complete the ASR task, in which a grapheme-based modeling unit is applied to address the multilingual ASR issue. Facing the problem of small transcribed samples in the ATC domain, an unsupervised approach with mask prediction is applied to pretrain the backbone network of the ASR model on unlabeled data by a feature-to-feature process. Finally, by integrating the SRL with ASR, an end-to-end multilingual ASR framework is formulated in a supervised manner, which is able to translate the raw wave into text in one model, i.e., wave-to-text. Experimental results on the ATCSpeech corpus demonstrate that the proposed approach achieves high performance with a very small labeled corpus and less resource consumption, only a 4.20% label error rate on the 58-hour transcribed corpus. Compared to the baseline model, the proposed approach obtains over 100% relative performance improvement which can be further enhanced with increasing size of the transcribed samples. It is also confirmed that the proposed SRL and training strategies make significant contributions to improving the final performance. In addition, the effectiveness of the proposed framework is also validated on common corpora (AISHELL, LibriSpeech, and cv-fr). More importantly, the proposed multilingual framework not only reduces the system complexity but also obtains higher accuracy compared to that of the independent monolingual ASR models. The proposed approach can also greatly reduce the cost of annotating samples, which benefits to advance the ASR technique to industrial applications.

Aims and objectives:This paper captures social dimensions of language in highly diverse small-scale multilingual contexts that appear to pose challenges for (socio)linguistic description and documentation. I focus on the seeming contradiction of monolingual imaginations of places with heterogeneous and multilingual inhabitants, on great fluidity and variability of language use and the concomitant limits of reification-based identification of codes, and on personalised repertoires shaped by individual trajectories and relational, rather than categorical, stances.Approach:I propose patterns and perspectives as two interrelated dimensions to guide research in configurations of this kind, illustrating epistemological and methodological points through data from multilingual settings in Casamance, Senegal.Data and analysis:I focus on data collected in the village of Agnack Grand and its surroundings, but also include data from across the Lower Casamance and adjacent regions of Guinea-Bissau, discussing patterns of multilingual organisation and extracts from conversation and how their speech forms are categorised.Findings:The paper brings sociohistorical dimensions of small-scale multilingualism to the fore and identifies their lasting influences on spatial representations of language regimes. Linguistic spaces influence perspectives on speech events taking place in them and circumscribe speech participants’ and observers’ choices in describing repertoires, producing and analysing speech forms. Beyond the selection of language modes, perspective also determines how speech forms are categorised. I demonstrate that the patterns speakers and observers have experienced and the perspectives they assume are decisive in shaping their perception.Originality:My central observation is that there is no objective, neutral viewpoint on (multilingual) speech, but that positionality frames it at all levels. I develop new epistemologies for studying these dimensions.Significance:Putting the categorisation processes employed by speakers and observers and their underlying motivations centre stage and integrating sociolinguistic and anthropological linguistic methods and historical knowledge into linguistic description and documentation constitutes an innovative research programme.

Multilingual Speech Research Articles

Related Topics

Articles published on Multilingual Speech

Spoken Language Identification Using Prosody, Phonotactics, and Acoustics: A Review

Multilingual hope speech detection in English and Dravidian languages

Multilingual Speech to Text using Deep Learning based on MFCC Features

Transcribing multilingual children’s and adults’ speech

A Multi-Lingual Speech Recognition-Based Framework to Human-Drone Interaction

A hybrid CTC+Attention model based on end-to-end framework for multilingual speech recognition

Multilingual speech recognition for GlobalPhone languages

An optimized machine translation technique for multi-lingual speech to sign language notation

SuperSpeech: Multilingual Speech and Language Maintenance Intervention for Vietnamese-Australian Children and Families via Telepractice.

Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling

Optimizing Data Usage for Low-Resource Speech Recognition

Is Attention Always Needed? A Case Study on Language Identification from Speech

An automatic machine translation system for multi-lingual speech to Indian sign language

Creating a Corpus of Multilingual Parent-Child Speech Remotely: Lessons Learned in a Large-Scale Onscreen Picturebook Sharing Task.

Investigating into the varieties of language spoken at Benue State University, Makurdi

Multilingual speech annotation of landmarks and other acoustic cues to distinctive features

ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems

Patterns and perspectives shape perceptions: Epistemological and methodological reflections on the study of small-scale multilingualism

Towards a coherent methodology for the documentation of small-scale multilingualism: Dealing with speech data

Code-switched automatic speech recognition in five South African languages

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Multilingual Speech Research Articles

Related Topics

Articles published on Multilingual Speech

Spoken Language Identification Using Prosody, Phonotactics, and Acoustics: A Review

Multilingual hope speech detection in English and Dravidian languages

Multilingual Speech to Text using Deep Learning based on MFCC Features

Transcribing multilingual children’s and adults’ speech

A Multi-Lingual Speech Recognition-Based Framework to Human-Drone Interaction

A hybrid CTC+Attention model based on end-to-end framework for multilingual speech recognition

Multilingual speech recognition for GlobalPhone languages

An optimized machine translation technique for multi-lingual speech to sign language notation

SuperSpeech: Multilingual Speech and Language Maintenance Intervention for Vietnamese-Australian Children and Families via Telepractice.

Improving low-resource Tibetan end-to-end ASR by multilingual and multilevel unit modeling

Optimizing Data Usage for Low-Resource Speech Recognition

Is Attention Always Needed? A Case Study on Language Identification from Speech

An automatic machine translation system for multi-lingual speech to Indian sign language

Creating a Corpus of Multilingual Parent-Child Speech Remotely: Lessons Learned in a Large-Scale Onscreen Picturebook Sharing Task.

Investigating into the varieties of language spoken at Benue State University, Makurdi

Multilingual speech annotation of landmarks and other acoustic cues to distinctive features

ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems

Patterns and perspectives shape perceptions: Epistemological and methodological reflections on the study of small-scale multilingualism

Towards a coherent methodology for the documentation of small-scale multilingualism: Dealing with speech data

Code-switched automatic speech recognition in five South African languages