Abstract

AbstractA semi‐empirical correlation, based on data from nine indexes, permits the prediction of the percentage of terms in a manipulative index vocabulary which will be used to index any given number of documents. This is a function of the total number of index entries in the system. A log‐normal relationship, similar to Zipf's Law, exists between total index entries and distribution of term usage. Based upon the correlation, optimum vocabulary size and growth rate can be inferred, as well as the most efficient arrangement of index entries in a storage medium. The results agree well with published data and appear to be particularly useful for designers of mechanized retrieval or publication operations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call