Abstract

Abstract Previous work using lexical data from around the world has suggested that distances between language varieties are distributed such that varieties are typically either rather similar, qualifying as dialects of the same language, or rather dissimilar, qualifying as different languages, with a scarcity of varieties that are around halfway similar. Using a potentially biased sample, Wichmann (2019) observed that there is a bimodal distribution of distances with two roughly normal distributions separated by a valley. Here we test whether a similar distribution is found when using another source of data and an unbiased sample drawn from the cells of a geographical grid (of central Europe). The data consists of 18 lexemes from 274 doculects. Using Bayesian beta regression and leave-one-out cross-validation, we show that the data follows a bimodal distribution which is robust to sampling, and also to at least some aspects of the data (coarse- vs. fine-grained phonetic transcriptions).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call