Accuracy of probabilistic and deterministic record linkage: the case of tuberculosis.

Gisele Pinto De Oliveira,Ana Luiza De Souza Bierrenbach,Rejane Sobrino Pinheiro,Cláudia Medina Coeli,Kenneth Rochel De Camargo Júnior

doi:10.1590/s1518-8787.2016050006327

Abstract

ABSTRACTOBJECTIVE To analyze the accuracy of deterministic and probabilistic record linkage to identify TB duplicate records, as well as the characteristics of discordant pairs.METHODS The study analyzed all TB records from 2009 to 2011 in the state of Rio de Janeiro. A deterministic record linkage algorithm was developed using a set of 70 rules, based on the combination of fragments of the key variables with or without modification (Soundex or substring). Each rule was formed by three or more fragments. The probabilistic approach required a cutoff point for the score, above which the links would be automatically classified as belonging to the same individual. The cutoff point was obtained by linkage of the Notifiable Diseases Information System – Tuberculosis database with itself, subsequent manual review and ROC curves and precision-recall. Sensitivity and specificity for accurate analysis were calculated.RESULTS Accuracy ranged from 87.2% to 95.2% for sensitivity and 99.8% to 99.9% for specificity for probabilistic and deterministic record linkage, respectively. The occurrence of missing values for the key variables and the low percentage of similarity measure for name and date of birth were mainly responsible for the failure to identify records of the same individual with the techniques used.CONCLUSIONS The two techniques showed a high level of correlation for pair classification. Although deterministic linkage identified more duplicate records than probabilistic linkage, the latter retrieved records not identified by the former. User need and experience should be considered when choosing the best technique to be used.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Revista de saude publica	Publication Date: Jan 1, 2016
Citations: 23	License type: cc-by

R Discovery Prime

R Discovery Prime

Accuracy of probabilistic and deterministic record linkage: the case of tuberculosis.

Abstract

Talk to us

Similar Papers

More From: Revista de saude publica

Lead the way for us

Similar Papers

Record Linkage Methodology for the Social Data Linkage Environment at Statistics Canada
Colin Babyak ... Abdelnasser Saidi
International Journal of Population Data Science | VOL. 1
Colin Babyak, et. al.Colin Babyak ... Abdelnasser Saidi
13 Apr 2017
International Journal of Population Data Science | VOL. 1

THE RECORD LINKAGE TECHNIQUE AS ALTERNATIVE FOR CAPTURING OF INFORMATION IN CANCER REGISTRY: APPLICATION IN A DATABASE OF BREAST CANCER CASES
Stela Verzinhasse Peres ... Maria Paula Curado
-
Stela Verzinhasse Peres, et. al.Stela Verzinhasse Peres ... Maria Paula Curado
01 Jan 2020
01 Jan 2020

Record Linkage in Healthcare
Gulzar H Shah ... Anteneh Ayanso
International Journal of Healthcare Delivery Reform Initiatives | VOL. 2
Gulzar H Shah, et. al.Gulzar H Shah ... Anteneh Ayanso
01 Jul 2010
International Journal of Healthcare Delivery Reform Initiatives | VOL. 2

Probabilistic record linkage is a valid and transparent tool to combine databases without a patient identification number
Nora Méray ... Gouke J Bonsel
Journal of Clinical Epidemiology | VOL. 60
Nora Méray, et. al.Nora Méray ... Gouke J Bonsel
17 May 2007
Journal of Clinical Epidemiology | VOL. 60

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Accuracy of probabilistic and deterministic record linkage: the case of tuberculosis.

Abstract

Talk to us

Similar Papers

More From: Revista de saude publica