Abstract

To facilitate advanced analytics, data science projects increasingly require records about individuals to be linked across databases. Generally no unique entity identifiers are available in the databases to be linked, and therefore quasi-identifiers such as names, addresses, and dates of birth are used to link records. The process of linking records without revealing any sensitive or confidential information about the entities represented by these records is known as privacy-preserving record linkage (PPRL). Various encoding and encryption based PPRL methods have been developed in the past two decades. Most existing PPRL methods calculate approximate similarities between records because errors and variations can occur in quasi-identifying attribute values. Even though being used in real-world linkage applications, certain PPRL methods, such as popular Bloom filter encoding, have shown to be vulnerable to cryptanalysis attacks. In this paper we present a novel attack on PPRL methods that exploits the approximate similarities calculated between encoded records. Our attack matches nodes in a similarity graph generated from an encoded database with a corresponding similarity graph generated from a plain-text database to re-identify sensitive values. Our attack is not limited to any specific PPRL method, and in an experimental evaluation we apply it on three PPRL encoding methods using three different databases. This evaluation shows that our attack can successfully re-identify sensitive values from these encodings with high accuracy where no previous attack on PPRL would have been successful.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.