Abstract

This article discusses structural, systems, and other types of bias that arise in matching new records to large databases. The focus is databases for bibliographic utilities, but other related database concerns will be discussed. Problems of satisfying a “match” with sufficient flexibility and rigor in an environment of imperfect data are presented, and sources of unintentional variance are discussed.

Highlights

  • This article discusses structural, systems, and other types of bias that arise in matching new records to large databases

  • Computerized records in a net­ worked environment have encouraged the recognition that duplicate records pose a serious threat to efficient information retrieval

  • Matching is defined as the process by which additions to a large database are screened and compared with existing database records

Read more

Summary

Language in formation of queries

MARC records frequently are a mixture of languages. As has been seen in other projects with intensive comparison of text, overlap in languages has the potential to confuse comparisons of short strings of text.[21]. The cataloging rules for describing formats have changed. 6. Even within a country, standards change over time, so that “correct” cataloging in one decade may not match that in a later period. Records themselves change over time as they are copied, derived, and migrated into other systems They may be enhanced or corrected in any system where they reside. When they return to the origi­ nating database, they may have been transformed so far as to be unrecognizable as representations of the same item. This problem is not unique to XWC; it is a challenge for any shared database where export of records and reentry is likely

Design bias
Change in proportions of languages in the
Challenges of Handling Ellipses in Titles Thought to be Similar
Partial Matches in Names Which Might Represent the Same Publisher
Publisher name may be partially or differently recorded in two records
Findings
Publisher name may have changed due to acquisition by another organization

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.