Measuring Language Distance of Isolated European Languages

Pablo Gamallo,José Ramom Pichel,Iñaki Alegria

doi:10.3390/info11040181

Abstract

Phylogenetics is a sub-field of historical linguistics whose aim is to classify a group of languages by considering their distances within a rooted tree that stands for their historical evolution. A few European languages do not belong to the Indo-European family or are otherwise isolated in the European rooted tree. Although it is not possible to establish phylogenetic links using basic strategies, it is possible to calculate the distances between these isolated languages and the rest using simple corpus-based techniques and natural language processing methods. The objective of this article is to select some isolated languages and measure the distance between them and from the other European languages, so as to shed light on the linguistic distances and proximities of these controversial languages without considering phylogenetic issues. The experiments were carried out with 40 European languages including six languages that are isolated in their corresponding families: Albanian, Armenian, Basque, Georgian, Greek, and Hungarian.

Highlights

The aim of computational linguistic phylogenetics is to estimate evolutionary histories of languages, which are usually represented in the form of a tree where the root stands for the common ancestor of its daughter languages, which are the leaves [1]
The six isolated languages follow a different pattern of behavior than that shown in the clustering process
It is important to emphasize that there are at least two patterns that have emerged in the previous clustering experiment: Albanian and Greek are close to Baltic languages, and Basque and Georgian are again very close to each other

Summary

Introduction

The aim of computational linguistic phylogenetics is to estimate evolutionary histories of languages, which are usually represented in the form of a tree where the root stands for the common ancestor of its daughter languages, which are the leaves [1]. The lexicostatistic method, developed by Morris Swadesh in the 1950s [2], requires defining a standard list of concepts, determine whether the corresponding words are written in similar form (whether they are cognate or not), compute the ratio of cognates shared by each pair of languages giving rise to a similarity matrix, and generate a graphic (usually a tree) on the basis of this matrix [3]. Such a strategy had a strong impact on phylogenetics and historical linguistics

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information	Publication Date: Mar 27, 2020
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Measuring Language Distance of Isolated European Languages

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports
David M Greer ... Rebecca Zhang
-
David M Greer, et. al.David M Greer ... Rebecca Zhang
19 Jun 2020
19 Jun 2020

Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports.
Charlene Jennifer Ong ... Margaret Minnig
PLOS ONE | VOL. 15
Charlene Jennifer Ong, et. al.Charlene Jennifer Ong ... Margaret Minnig
19 Jun 2020
PLOS ONE | VOL. 15

1287-P: Identifying When Incident Diabetes Was Diagnosed in Children and Young Adults, Using Natural Language Processing of Clinical Notes
Anthony Wong ... Marc Rosenman
Diabetes | VOL. 72
Anthony Wong, et. al.Anthony Wong ... Marc Rosenman
20 Jun 2023
Diabetes | VOL. 72

Linguistic Approach to Semantic Correlation Rules
Charlotte Effenberger ... H.A Cantu Campos
SHS Web of Conferences | VOL. 102
Charlotte Effenberger, et. al.Charlotte Effenberger ... H.A Cantu Campos
01 Jan 2020
SHS Web of Conferences | VOL. 102

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Measuring Language Distance of Isolated European Languages

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information