Abstract

Our research explores the practice of Record Linkage (RL), also known as Entity Resolution, Record Matching and the Object Identity Problem, in Big health services databases as is commonly practiced within the domain, and some of the approximate string matching methods used for this purpose. We also propose potential improvements to RL and string matching that have been shown in experiments to increase the quality and efficiency for information systems tasked with this problem. We have developed an in-memory graph-based data model, Aggregate Link and Iterative Match (ALIM), which compresses data by eliminating redundancy and stores alias, approximate and phonetic match links between stored data. We have also developed an enhanced edit-distance optimization, the Probabilistic Signature Hash Filter (PSH), which can perform the Damerau-Levenshtein (DL) edit-distance comparison nearly 6000 times faster than DL alone and produce the same exact approximate match results. Our experiments show significant accuracy and performance gains over a system currently in use by a local health department.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call