Linkage in medical records and bioinformatics data

Shen Lu,Richard S Segall

doi:10.1504/ijids.2013.053803

Abstract

Multiple records for different visits of patients result in redundant information among multiple data sources. We can increase the amount of information available for population units required by stand-alone and distributed databases by matching and merging duplicate records. In this paper, we provide an algorithm, called entity resolution of the Fellegi-Sunter (ERFS) model. In this paper, we used the Fellegi-Sunter model to improve the results of semantic analysis for identification of similar records. According to our experimental results we find that ERFS yields rates that are higher for about 11.07% of the experiments than those using the Stanford entity resolution framework (SERF). Because we found that for these 11.07% there were 38.1% of the experiments conducted having increases ranging from 12.7% to 21.9%, with mid-range size of the number of records having an average increase of 16.96%, it can be concluded that ERFS should be used to link similar records.

Full Text