Abstract

Abstract An empirical study on about 1.7 million dictionary words from seven languages viz. English, French, Dutch, Spanish, Italian, Hindi, and German has been conducted. Three intriguing characteristic features have been analyzed. First, the alphabet usage pattern in a language was determined which can be used to give an idea on how alphabets have been employed. For instance, the alphabet ‘e’ is highly used in English, while ‘q’ is least used. Second, the average and range of word lengths in the languages were computed and seen to vary from 1 to 37. Average word lengths were computed in the range (6.665–11.14). For comparison, word lengths have been fitted using Gaussian distribution. Third, a new measure was derived; which we termed ‘Language Sparsity’; computed as one minus ratio of number of words of a particular length already existing to the total number of possible words that can be formed. Sparsity hence gives a measure of the scope of fruition in languages. Two such measures have been defined: a weighted and a nonweighted sparsity. Nonweighted sparsity was found to be minimum (0.877) for English and maximum (0.982) for Dutch. The results obtained can play a significant role in propagating the synergy of language evolution.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call