Abstract
The objective was to test and assess the accuracy of a scoring method in probabilistic data linkage in order to enable automatic identification of true matches, dispensing with the manual inspection stage. Accuracy study using data from the Breast Cancer Information System (SISMAMA) base in Minas Gerais State, Brazil, from 2009 and 2010. After cleaning and standardization, a 16-step probabilistic linkage of the 2009 and 2010 databases was performed, where each step was inspected manually to obtain a gold standard. Samples were then selected, inspected, and assessed to calculate the method's accuracy in selecting true matches. All the steps and samples with 200 and 300 matches showed high sensitivity (recall) > 0.97, high positive predictive value (precision) > 0.95, high accuracy (> 0.97) and F measure (> 0.96), and high area under the curve precision-recall (> 0.98). The sample with 100 matches showed high values for these measures, but with low scores. Of the 16 steps assessed, the combined use of only three was sufficient to identify 99.24% of the true matches in the total database. The proposed method allows automatically linking databases, maintaining the method's accuracy. It facilitates the use of probabilistic linkage in health services, especially for health surveillance and management.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.