Abstract

Data systems collecting information from different sources or over long periods of time can receive multiple reports from the same indi vidual. An important example is public health surveillance systems that monitor conditions with long natural histories. Several state-level systems for surveillance of one such condition, the human immunodeficiency virus (HIV), use codes composed of combinations of non-unique personal charac teristics such as birth date, soundex (a code based on last name), and sex as patient identifiers. As a result, these systems cannot distinguish between several different individuals having identical codes and a unique individual erroneously represented several times. We applied results for occupancy models to estimate the potential magnitude of duplicate case counting for AIDS cases reported to the Centers for Disease Control and Prevention with only non-unique partial personal identifiers. Occupancy models with equal and unequal occupancy probabilities are considered. Unbiased estimators for the numbers of true duplicates within and between case reporting areas are provided. Formulas to calculate estimators’ variances are also provided. These results can be applied to evaluating duplicate reporting in other data systems that have no unique identifier for each individual.

Highlights

  • Public health surveillance systems that monitor conditions with long natural histories can receive multiple reports from different sources regarding the same affected individual

  • If there were no true duplicates in the AIDS surveillance system, given the number of cases reported to the system, the number of distinct combinations of sex, soundex, and date of birth would satisfy the equations provided in the previous sections

  • Since the number of sex, soundex, and date of birth combinations is observable and not affected by true duplicate reporting, we can work backwards to estimate the number of reported persons with AIDS

Read more

Summary

Introduction

Public health surveillance systems that monitor conditions with long natural histories can receive multiple reports from different sources regarding the same affected individual. When information submitted to a surveillance system cannot uniquely identify an individual, and the potential for duplicate reports being submitted to the system exists, the system must use additional information to determine if cases with the same nonunique identifiers represent the same person. For this discussion, we call reports with the same partial personal identifiers “potential duplicates”. Larsen (1994) considered this problem in a register of HIV infected persons, using a method to estimate the number of distinct individuals in the register based on the date of birth of each entry and classical occupancy theory where each ball has the same chance of falling into any one of the cells. Some concerns and recommendations are presented in the discussion section

Occupancy Model with Equal Occupancy Probabilities
Occupancy Model with Unequal Occupancy Probabilities
Occupancy Problem When Cells Are Filled with Balls of Different Colors
Application to Analysis of Duplicates in AIDS Case Reporting
Findings
Summary and Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call