Introduction: The Coronary Artery Risk Detection in Appalachian Communities (CARDIAC) Project gathers anthropometric, BP and lipid data from fifth graders in West Virginia in the past 18 y. 60,403 children had LDL cholesterol and we found 5259 sets of siblings by direct match on mothers first and last name. The suggestion that more sibships could be identified prompted evaluation of Link Plus software from Centers for Disease Control (CDC) to improve matching. Methods: LinkPlus generates potential matches via a probabilistic algorithm that allows relative weighting of multiple factors such as first and last name. For our purposes the deduplication rather than matching algorithm was run using mother’s first and last name using the NYSIIS (New York State Identification and Intelligence System) phonetic schema to avoid creating multiple many-to-many relationships that were difficult to analyze. Additional variables considered included county, street address, telephone, fathers first and last name, school. Subject last name was used as a blocking variable. Results: 7602 matched siblings were generated by the program that determined a probability score ranging from 61.3 to cut off at 15; few matches were observed below this level. The figure demonstrates exponential decay beginning at a probability score of 26 with 95% accuracy at 25.5. 6827 pairs were included at that level including 6824 matched pairs and only 3 false positive pairs. Partial matches (n = 61) likely are half sibs including exact match of telephone and/or street but only one parent matching. Child surname was not used in the algorithm. Typographical errors were accounted by Link Plus. Lipid correlations were similar to those found with excel but more robust. Conclusion: The Link Plus record matching program from CDC is able to successfully determine sibships with increased sensitivity compared with a direct match from a sorted excel file. The program was able to identify likely sibs and half-sibs plus avoid non-match due to minor typo errors in the analyzed fields.
Read full abstract