THE PROBLEM OF IDENTIFYING AN ORDINAL NUMBER WITH A RANKING POLISH FREQUENCY DICTIONARIES The frequency dictionaries listed in the title contain three ranking lists based on the F index (absolute frequency), U index (relative frequency) and D index (measure of uniformity and distribution in styles). There are serious objections to the method used, according to which, when the F, U or D index values are equal, the entry rank (i.e. the number in the list) is decided by the alphabetical order. In case of words with high frequencies, each word is usually assigned a separate rank, because they differ in frequency. With increasingly lower frequencies, the same frequency is shared by a few, a dozen, a few hundred or even thousands of words (for example frequency 2 or 1). Words with the same frequency should have the same rank. In Polish frequency dictionaries, however, their rank differs depending on the alphabetical order. Hence, the ordinal number of a word is made equal to its rank. I believe that such method falls short of scholarly standards, because words with the same frequency differ considerably in rank. For example, in the Polish frequency dictionary (Słownik frekwencyjny polszczyzny współczesnej, Kraków 1990), words with frequency 4 are assigned ranks from 8739 (abstrahować) to 10355 (żywot) – a difference of 1616 places on the ranking list. For words with frequency 3, 2 and 1, the differences in the ranking list would be much greater. The ordinal number of words is not equated with their rank in, among others, frequency dictionaries of Slovak and Croatian as well as the Bulgarian dictionary of colloquial speech. Nevertheless, many frequency dictionaries of various languages do equate the number and the rank. Therefore, a discussion involving linguists interested in statistical linguistics is necessary.
Read full abstract