Automatic recognition of title page names

Mavis Molto,Elaine Svenonius

doi:10.1016/0306-4573(91)90032-h

Abstract

The general question addressed by the study concerns the feasibility of developing automatic name recognition algorithms to distinguish character strings representing names from other character strings occurring on English language title pages. To answer the question, two name recognition algorithms were tested: one to recognize personal names and the other corporate names. The algorithms involved matching title page names with names in authority files and identifying postname markers. The success rates for the corporate and personal name algorithms were 85.8% and 84.5%, respectively, with a precision rate of 89.3% for the latter. The corporate name algorithm worked significantly better for public as opposed to university library data. The personal name algorithm worked significantly better for names deemed to be useful access points in retrieval. The personal name algorithm, further, resulted in significantly higher precision for the public library data, as well as for names that were useful access points. It is anticipated that the algorithms could be even more nearly perfected by increasing the number of names in the authority files, especially names of publishers, colleges, and universities. The findings offer cautious promise for alleviating some of the labor intensive work of cataloging by providing a means for automatically recognizing names on title pages.

Full Text