Abstract

Real-world applications of record linkage often require matching to be robust in spite of small variations in string fields. For example, two health care providers should be able to detect a patient in common, even if one record contains a typo or transcription error. In the privacy-preserving setting, however, the problem of approximate string matching has been cast as a trade-off between security and practicality, and the literature has mainly focused on Bloom filter encodings , an approach which can leak significant information about the underlying records. We present a novel public-key construction for secure two-party evaluation of threshold functions in restricted domains based on embeddings found in the message spaces of additively homomorphic encryption schemes. We use this to construct an efficient two-party protocol for privately computing the threshold Dice coefficient. Relative to the approach of Bloom filter encodings, our proposal offers formal security guarantees and greater matching accuracy. We implement the protocol and demonstrate the feasibility of this approach in linking medium-sized patient databases with tens of thousands of records.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.