Corpus Phonetics

Mark Y Liberman

doi:10.1146/annurev-linguistics-011516-033830

Abstract

Semiautomatic analysis of digital speech collections is transforming the science of phonetics. Convenient search and analysis of large published bodies of recordings, transcripts, metadata, and annotations—up to three or four orders of magnitude larger than a few decades ago—have created a trend towards “corpus phonetics,” whose benefits include greatly increased researcher productivity, better coverage of variation in speech patterns, and crucial support for reproducibility. The results of this work include insights into theoretical questions at all levels of linguistic analysis, along with applications in fields as diverse as psychology, medicine, and poetics, as well as within phonetics itself. Remaining challenges include still-limited access to the necessary skills and a lack of consistent standards. These changes coincide with the broader Open Data movement, but future solutions will also need to include more constrained forms of publication motivated by valid concerns for privacy, confidentiality, and intellectual property.

Full Text