Abstract

Privacy-preserving record linkage (PPRL) supports the matching and integration of person-related data, e.g., on patients or customers without compromising privacy. It is based on the encoding of sensitive attribute values needed for matching and often involves trusted parties for linkage. We report on recent research results from the Big Data center ScaDS Dresden/Leipzig to improve the efficiency, scalability and quality of PPRL, and to apply PPRL in the medical domain. In particular, we present the use of pivot-based filtering techniques and LSH (locality-sensitive hashing)-based blocking to reduce the number of comparisons. Furthermore, we report on parallel linkage implementations based on Apache Flink supporting scalability to millions of records.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call