Abstract

This study investigates the feasibility of applying complex networks to fine-grained language classification and of employing word co-occurrence networks based on parallel texts as a substitute for syntactic dependency networks in complex-network-based language classification. 14 word co-occurrence networks were constructed based on parallel texts of 12 Slavic languages and 2 non-Slavic languages, respectively. With appropriate combinations of major parameters of these networks, cluster analysis was able to distinguish the Slavic languages from the non-Slavic and correctly group the Slavic languages into their respective sub-branches. Moreover, the clustering could also capture the genetic relationships of some of these Slavic languages within their sub-branches. The results have shown that word co-occurrence networks based on parallel texts are applicable to fine-grained language classification and they constitute a more convenient substitute for syntactic dependency networks in complex-network-based language classification.

Highlights

  • This study investigates the feasibility of applying complex networks to fine-grained language classification and of employing word co-occurrence networks based on parallel texts as a substitute for syntactic dependency networks in complex-network-based language classification

  • We constructed 14 word co-occurrence networks based on parallel texts of 12 Slavic languages and 2 non-Slavic languages, respectively, and conducted cluster analysis to these networks according to different combinations of their major complex network parameters

  • As the 14 languages are mostly Slavic languages, we focused on how well the results of clustering captured the genetic relationships of the 12 Slavic languages

Read more

Summary

Introduction

This study investigates the feasibility of applying complex networks to fine-grained language classification and of employing word co-occurrence networks based on parallel texts as a substitute for syntactic dependency networks in complex-network-based language classification. 14 word co-occurrence networks were constructed based on parallel texts of 12 Slavic languages and 2 non-Slavic languages, respectively. The results have shown that word co-occurrence networks based on parallel texts are applicable to fine-grained language classification and they constitute a more convenient substitute for syntactic dependency networks in complex-networkbased language classification. Studies [7–9] have shown that we can classify languages through cluster analysis of their syntactic dependency networks (with different word forms as vertices and the syntactic dependency relations between them as edges) according to their major complex network parameters. The results of classification can generally capture the genetic relationships of the languages as found in the language families This complex-network-based language classification falls under the heading of typological classification, which focuses on structural features of languages [10]. The feasibility of the complexnetwork-based language classification [7–9] indicates that the major parameters of complex networks can capture the diversity of networks in the real world, in addition to re-. The use of complex networks in language classification expands the application of complex networks and broadens the horizon of complex networks research

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call