Abstract

Starting from a distance which highlights similarities and differences among populations, dialectal classification allows the border between varieties to be established and transition zones (border populations) to be identified. The high cost of conducting and processing surveys to a great extent limits the size of the samples used, the number of localities and the time interval between fieldworks to determine dialect variation over time. Although recently other methods of gathering information have been developed, for those who prefer face to face methods we have introduced a method which allows researchers to select the subset of the most informative linguistic items. In order to maximize the similarity between the classifications obtained with the selected subset and the complete set of linguistic items, we have defined a measure of similarity which highlights redundancy between items (Simple Matching Coefficient), we have grouped items by similarity (K-means method), and finally, we have chosen the most representative linguistic items in the representation obtained from Ward’s method, proportionally according to the size of each one of the subgroups. This exploratory study made use of the Bourciez Corpus, focusing on Basque language data, to illustrate the methodology.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call