Abstract
Using an incremental approach to solve the record linkage problem is a relatively new research area. In incremental record linkage, every inserted record is compared with some existing clusters of records based on its blocking key value. Then, considering similarity, either the record will be put into an existing cluster, or a new cluster will be created for it. Although few papers have presented their solutions for incremental record linkage targeting the linkage quality or efficiency, privacy issue regarding the approach has not yet been discussed. Privacy is a major concern when record linkage is performed for sensitive data, e.g., health records, financial records, etc. In this regard, we have come up with a novel concept privacy-preserving incremental record linkage (PPiRL) which encapsulates privacy-preserving techniques with an incremental record linkage approach. In this chapter, we have proposed an end-to-end framework as our solution for PPiRL. For preserving privacy, we have used two types of privacy techniques namely phonetic encoding and generalization. We have used a recently developed phonetic algorithm “nameGist” to handle text-based features. For generalization, we have used the K-anonymization algorithm for numeric and categorical features. For handling incremental updates and internal linkage, we have used the Naive incremental clustering approach using Hierarchical Agglomerative clustering as the base clustering algorithm. We have performed various experiments to test the privacy and linkage quality of PPiRL. We have compared our work with the existing incremental record linkage framework and also with existing privacy-preserved record linkage techniques. It is apparent from our results that other than a small trade-off in linkage quality, our framework works better as a combined package of privacy and linkage solutions that any existing frameworks do not yet provide.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.