Word-length Entropies and Correlations of Natural Language Written Texts

Maria Kalimeri,Vassilios Constantoudis,Constantinos Papadimitriou,Konstantinos Karamanos,Fotis K Diakonos,Harris Papageorgiou

doi:10.1080/09296174.2014.1001636

Word-length Entropies and Correlations of Natural Language Written Texts

Maria Kalimeri, Vassilios Constantoudis + Show 4 more

Open Access

https://doi.org/10.1080/09296174.2014.1001636

Copy DOI

Journal: Journal of Quantitative Linguistics	Publication Date: Mar 19, 2015
Citations: 21

Affiliation: National Technical University of Athens, National Centre of Scientific Research "Demokritos", National and Kapodistrian University of Athens, Institute for Language and Speech Processing

#Word Lengths #Finnish Languages + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

We study the frequency distributions and correlations of the word lengths of 10 European languages. Our findings indicate that (a) the word-length distribution of short words quantified by the mean value and the entropy distinguishes the Uralic (Finnish) corpus from the others, (b) the tails at long words, manifested in the high-order moments of the distributions, differentiate the Germanic languages (except for English) from the Romanic languages and Greek and (c) the correlations between nearby word lengths measured by the comparison of the real entropies with those of the shuffled texts are found to be smaller in the case of Germanic and Finnish languages.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: Journal of Quantitative Linguistics

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.