Abstract

Abstract Record matching processes, which compare sets of identifying information, to decide whether or not a pair of records relate to the same individual or population item, are basic in a wide range of applications in social research, maintenance of files and information retrieval. Such processes may be conveniently described in terms of matching a single incoming record against a master file or list. In order to evaluate different record matching processes, in terms of matching costs and error losses, it is necessary to evaluate the outcome probabilities. It is shown that this can be done for a simple model which assumes that the information used for matching is complete and invariant but, possibly, insufficient to distinguish between all population items, by considering only the class-size probability distributions. The latter can be estimated directly from the list or from a sub-sample drawn from it, by the application of Goodman's [2] results concerning the estimation of the number of classes in a population. The outcome probabilities can then be evaluated by considering the incoming record as randomly drawn from the list, if it should match some item on the list, and as added to the list, to form a new list with approximately the same classification probabilities, if it ought to match no item on the list. A numerical example illustrates an application of the model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.