A Global Lexical Database (GLED) for Computational Historical Linguistics

Tiago Tresoldi

doi:10.5334/johd.96

A Global Lexical Database (GLED) for Computational Historical Linguistics

Tiago Tresoldi

Open Access

https://doi.org/10.5334/johd.96

Copy DOI

Journal: Journal of Open Humanities Data	Publication Date: Feb 2, 2023
License type: cc-by

Affiliation: Uppsala University

#Markov Chain Monte Carlo #Markov Chain Monte Carlo Inference + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

This work presents a lexical database with cognate annotation and phonological alignment for over 6,500 documented language varieties. The database includes per-family and global phylogenetic resources and offers a pre-computed global tree for language variety distance from normalized trees obtained with Bayesian Markov Chain Monte Carlo (MCMC) inference. Lexical data is provided in a single tabular file for convenience of usage, and resources are built adhering to best practices and state-of-the-art algorithms for historical linguistics. The database is a convenient source for research prototypes, method development, and analysis bootstrap. All resources are freely available for download for all interested researchers.

Full Text