Sparse Transcription

Steven Bird

doi:10.1162/coli_a_00387

Abstract

Abstract The transcription bottleneck is often cited as a major obstacle for efforts to document the world’s endangered languages and supply them with language technologies. One solution is to extend methods from automatic speech recognition and machine translation, and recruit linguists to provide narrow phonetic transcriptions and sentence-aligned translations. However, I believe that these approaches are not a good fit with the available data and skills, or with long-established practices that are essentially word-based. In seeking a more effective approach, I consider a century of transcription practice and a wide range of computational approaches, before proposing a computational model based on spoken term detection that I call “sparse transcription.” This represents a shift away from current assumptions that we transcribe phones, transcribe fully, and transcribe first. Instead, sparse transcription combines the older practice of word-level transcription with interpretive, iterative, and interactive processes that are amenable to wider participation and that open the way to new methods for processing oral languages.

Highlights

Most of the world’s languages only exist in spoken form
Behind the formats is the process for creating them: No matter how careful I think I am being with my transcriptions, from the very first text to the very last, for every language that I have ever studied in the field, I have had to re-transcribe my earliest texts in the light of new analyses that have come to light by the time I got to my later texts
We review existing computational approaches to transcription that go beyond the methods inspired by automatic speech recognition, and consider to what extent they already address the requirements coming from the practices of linguists

Summary

Introduction

Most of the world’s languages only exist in spoken form. These oral vernaculars include endangered languages and regional varieties of major languages. Even assuming that linguists comply with these exhortations, they must still correct the output of the recognizer while re-listening to the source audio, and they must still identify words and produce a word-level transcription. There are locally available skills, such as the ability of speakers to recognize words in context, repeat them in isolation, and say something about what they mean This leads us to consider a new model for large scale transcription that consists of identifying and cataloging words in an open-ended speech collection. I elaborate this “Sparse Transcription Model” and argue that it is a good fit to the task of transcribing oral languages, in terms of the available inputs, the desired outputs, and the available human capacity. I conclude with a summary of the contributions, highlighting benefits for flexibility, for scalability, and for working effectively alongside speakers of oral languages (Section 5)

Background

Why Linguists Transcribe

How Linguists Transcribe

Technological Support for Working with Oral Languages

Requirements for Learning to Transcribe

Computation

Segmenting and Aligning Phone Sequences

Leveraging Translations for Segmentation

Bypassing Transcription

Spoken Term Detection

Test Sets and Evaluation Measures

Summary

Addressing the Requirements

The Sparse Transcription Model

Overview

Transcription Tasks

Transcription Workflows

Evaluation

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computational Linguistics	Publication Date: Feb 1, 2021
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Sparse Transcription

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computational Linguistics

Lead the way for us

Similar Papers

Quality estimation for asr k-best list rescoring in spoken language translation
Raymond W M Ng ... Wilker Aziz
-
Raymond W M Ng, et. al.Raymond W M Ng ... Wilker Aziz
01 Apr 2015
01 Apr 2015

QCRI's Live Speech Translation System
Fahim Dalvi ... Stephan Vogel
-
Fahim Dalvi, et. al.Fahim Dalvi ... Stephan Vogel
01 Jan 2018
01 Jan 2018

An empirical analysis on the efficiency of five interlingual live subtitling workflows
Pablo Romero Fresco ... Luis Alonso Bacigalupe
XLinguae | VOL. 15
Pablo Romero Fresco, et. al.Pablo Romero Fresco ... Luis Alonso Bacigalupe
01 Apr 2022
XLinguae | VOL. 15

Cascaded Models with Cyclic Feedback for Direct Speech Translation
Tsz Kin Lam ... Shigehiko Schamoni
-
Tsz Kin Lam, et. al.Tsz Kin Lam ... Shigehiko Schamoni
06 Jun 2021
06 Jun 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sparse Transcription

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computational Linguistics