Abstract

Bibliographic collections in traditional libraries often compile records from distributed sources where variable criteria have been applied to the normalization of the data. Furthermore, the source records often follow classical standards, such as MARC21, where a strict normalization of author names is not enforced. The identification of equivalent records in large catalogues is therefore required, for example, when migrating the data to new repositories which apply modern specifications for cataloguing, such as the FRBR and RDA standards. An open-source tool has been implemented to assist authority control in bibliographic catalogues when external features (such as the citations found in scientific articles) are not available for the disambiguation of creator names. This tool is based on similarity measures between the variants of author names combined with a parser which interprets the dates and periods associated with the creator. An efficient data structure (the unigram frequency vector trie) has been used to accelerate the identification of variants. The algorithms employed and the attribute grammar are described in detail and their implementation is distributed as an open-source resource to allow for an easier uptake.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call