Abstract

Abstract Computational approaches to historical linguistics have been proposed for half a century. Within the last decade, this line of research has received a major boost, owing both to the transfer of ideas and software from computational biology and to the release of several large electronic data resources suitable for systematic comparative work. In this article, some of the central research topics of this new wave of computational historical linguistics are introduced and discussed. These are automatic assessment of genetic relatedness, automatic cognate detection, phylogenetic inference and ancestral state reconstruction. They will be demonstrated by means of a case study of automatically reconstructing a Proto-Romance word list from lexical data of 50 modern Romance languages and dialects. The results illustrate both the strengths and the weaknesses of the current state of the art of automating the comparative method.

Highlights

  • Historical linguistics is the oldest sub-discipline of linguistics, and it constitutes an amazing success story

  • The success of historical linguistics is owed to a large degree to a collection of very stringent methodological principles that go by the name of the comparative method (Meillet 1954; Weiss 2015)

  • A final step toward the reconstruction of Proto-Romance forms, Ancestral State Reconstruction is performed for the sound classes in each column, for each multiple sequence alignment (MSA) obtained in the previous step

Read more

Summary

Introduction

Historical linguistics is the oldest sub-discipline of linguistics, and it constitutes an amazing success story. The success of historical linguistics is owed to a large degree to a collection of very stringent methodological principles that go by the name of the comparative method (Meillet 1954; Weiss 2015). It can be summarized by the following workflow (from Ross and Durie 1996: 6–7):. While the mentioned proposals mostly constitute isolated efforts of historical and computational linguists, the emerging field of computational historical linguistics received a major impetus since the early 2000s by the work of computational biologists such as Alexandre Bouchard-Côté, Russell Gray, Robert McMahon, Mark Pagel or Tandy Warnow and co-workers, who applied methods from their field to the problem of the reconstruction of language history, often in collaboration with linguists. The focus of this article is on computational work inspired by the comparative method, so this line of work will not further be covered here

A program for computational historical linguistics
A case study: reconstructing Proto-Romance
Demonstration of genetic relationship
Pairwise string comparison
Cognate clustering
General remarks
Application to the case study
Ancestral state reconstruction
Multiple sequence alignment
Proto-form reconstruction
Evaluation
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call