Abstract
Anonymization is the process of modifying a data set to prevent the identification of individual people from the data. However, most studies consider only the anonymization of data from a single domain. No study has been made on the risk of re-identification from combined data sets involving more than one domain. This paper proposes an evaluation of the risk of re-identification from payment card histories in multiple domains. First, we model the correlation between two histories from different usage domains in terms of information entropy and use mutual information to quantify the risk of identification from the data. Second, we describe an experiment to evaluate the risk in payment card data. The results validated the proposed method for real payment card data from 31 subjects. Metrics for the privacy and utility of 47 anonymized data items were evaluated. Overall, we found that there was a correlation between the histories of transportation and item purchases stored in the payment card data and established that most (44 of 47) of the anonymized data enabled correct identification with more than 45% accuracy for any privacy metric. This indicates that the risk of re-identification from payment card data is very high.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.