Abstract

Abstract Automatic language classification is an important contribution to linguistic research. Four statistical features concerning long-range correlations are applied to classify syntactic properties of languages. We calculate Zipf’s exponent, Heaps’ exponent, fractal dimension and entropy, for the Bible translations to one hundred live languages from twenty-eight language families. The Bible has unique concept regardless of its language, but the discrepancy in grammatical rules of the languages leads to difference in extracted measures from its various translations. The results show that, geographical distance and cultural differences can lead to statistical discrepancies. All extracted features for the Bible translations have normal distribution around their average value. This fact categorizes the languages into two groups; a majority of normal languages and a minority of abnormal ones. There is also evident (anti)correlation relation between each pair of the mentioned metrics due to their respective mechanism. Standard deviation of the considered statistical features over language families is affected by geographical distance between communities that speak to their languages and their cultural diversity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call