Abstract

Keys consisting of variable-length chamcter strings from the front and rear of surnames, derived by analysis of author names in a particular data base, am used to provide approximate representations of author names. When combined in appropriate ratios, and used together with keys for each of the first two initials of personal names, they provide a high degree of discrimination in search.
 Methods for optimization of key-sets are described, and the performance of key-sets varying in size between 150 and 300 is determined at file sizes of up to 50,000 name entries. The effects of varying the proportions of the queries present in the file are also examined. The results obtained with fixed-length keys are compared with those for variable-length keys, showing the latter to be greatly superior.
 Implications of the work for a variety of types of information systems are discussed.

Highlights

  • In Part I of this series the development of variety generators, or sets of variable-length keys with high relative entropies of occurrence, from the initial and terminal character strings of authors' surnames was described

  • The performance of combined key-sets of various compositions is determined at a range of file sizes and compared with fixed-length keys

  • In operational systems in which one or more author names are associated with a particular bibliographical item, it will be necessary to provide for description of each of these for access

Read more

Summary

INTRODUCTION

In Part I of this series the development of variety generators, or sets of variable-length keys with high relative entropies of occurrence, from the initial and terminal character strings of authors' surnames was described. Their purpose, used singly or in combination, is to provide a high and constant degree of discrimination among personal names so as to facilitate searches for them. In order to test this, a series of combined key-sets of different total sizes was produced, in which the proportions of keys were varied around the ratio of the redundancies of the first and last character positions, i.e., ( 1 - 0.92): ( 1 - 0.86), or 8:14. For each of the sets chosen, the distributions of the entries resulting from application of the combined key-sets to the file of 50,000 names were determined.

AL KIN
EVALUATION OF RETRIEVAL EFFECTIVENESS
F ER p s
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call