Blowing in the wind: Using ‘North Wind and the Sun’ texts to sample phoneme inventories

Louise Baird,Simon J Greenhill,Nicholas Evans

doi:10.1017/s002510032000033x

Abstract

Language documentation faces a persistent and pervasive problem: How much material is enough to represent a language fully? How much text would we need to sample the full phoneme inventory of a language? In the phonetic/phonemic domain, what proportion of the phoneme inventory can we expect to sample in a text of a given length? Answering these questions in a quantifiable way is tricky, but asking them is necessary. The cumulative collection of Illustrative Texts published in the Illustration series in this journal over more than four decades (mostly renditions of the ‘North Wind and the Sun’) gives us an ideal dataset for pursuing these questions. Here we investigate a tractable subset of the above questions, namely: What proportion of a language’s phoneme inventory do these texts enable us to recover, in the minimal sense of having at least one allophone of each phoneme? We find that, even with this low bar, only three languages (Modern Greek, Shipibo and the Treger dialect of Breton) attest all phonemes in these texts. Unsurprisingly, these languages sit at the low end of phoneme inventory sizes (respectively 23, 24 and 36 phonemes). We then estimate the rate at which phonemes are sampled in the Illustrative Texts and extrapolate to see how much text it might take to display a language’s full inventory. Finally, we discuss the implications of these findings for linguistics in its quest to represent the world’s phonetic diversity, and forJIPAin its design requirements for Illustrations and in particular whether supplementary panphonic texts should be included.

Highlights

How much is enough? How much data do we need to have an adequate record of a language? These are key questions for language documentation
Where does ‘comprehensive’ stop? The syntax? A dictionary of a specified minimum size, or should we aim to describe the ‘total’ lexicon? Steven Bird posted the following enquiry to the Resource Network for Linguistic Diversity mailing list (21/11/2015): Have any endangered language documentation projects succeeded according to this [Himmelmann’s] definition? For those that have not succeeded, would anyone want to claim that, for some particular language, we are half of the way there? Or some other fraction? What still needs to be done? Or, if a comprehensive record is unattainable in principle, is there consensus on what an adequate record looks like
Remaining just in the phonology, other levels of coverage – attesting all minimal pairs needed to establish the body of phonological contrasts, or all clusters needed to fully specify the phonotactics, or all word structures needed to understand the metrical phonology, tone sandhi etc. – will all need larger corpora because each needs to combine all phonemes in some more complex way

Summary

Introduction

How much data do we need to have an adequate record of a language? These are key questions for language documentation. Though we concur in principle with Himmelmann’s (1998: 166) answer, that ‘the aim of a language documentation is to provide a comprehensive record of the linguistic practices characteristic of a given speech. Greenhill community’, this formulation is extremely broad and difficult to quantify. Steven Bird posted the following enquiry to the Resource Network for Linguistic Diversity mailing list (21/11/2015): Have any endangered language documentation projects succeeded according to this [Himmelmann’s] definition? For those that have not (yet) succeeded, would anyone want to claim that, for some particular language, we are half of the way there? If a comprehensive record is unattainable in principle, is there consensus on what an adequate record looks like.

Objectives

Findings

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of the International Phonetic Association	Publication Date: Jun 7, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Blowing in the wind: Using ‘North Wind and the Sun’ texts to sample phoneme inventories

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of the International Phonetic Association

Lead the way for us

Similar Papers

Word length is (in part) predicted by phoneme inventory size and syllable structure
Ian Maddieson
The Journal of the Acoustical Society of America | VOL. 139
Ian MaddiesonIan Maddieson
01 Apr 2016
The Journal of the Acoustical Society of America | VOL. 139

Minimal pairs and functional loads of sound contrasts obtained from a list of modern greek words
Constandinos Kalimeris ... Stelios Bakamidis
-
Constandinos Kalimeris, et. al.Constandinos Kalimeris ... Stelios Bakamidis
27 Aug 2007
27 Aug 2007

The Gothic Language: Grammar, Genetic Provenance and Typology, Readings by Irmengard Rauch (review)
C J Wells
Modern Language Review | VOL. 100
C J WellsC J Wells
01 Apr 2005
Modern Language Review | VOL. 100

Towards Zero-Shot Learning for Automatic Phonemic Transcription
Xinjian Li ... David Mortensen
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34
Xinjian Li, et. al.Xinjian Li ... David Mortensen
03 Apr 2020
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Blowing in the wind: Using ‘North Wind and the Sun’ texts to sample phoneme inventories

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of the International Phonetic Association