Abstract

At present, massive amounts of data are utilized for artificial intelligence technologies such as machine learning and deep learning. However, these data must be utilized carefully while preserving data privacy. Data anonymization is a technique enabling both data mining and privacy protection, preventing the identification of individuals by generalizing the data to include multiple records with the same values. In this study, we consider a data-publishing infrastructure for personal data sharing. The infrastructure anonymizes data prior to publishing it to users for privacy protection; however, the problem of unauthorized republishing by malicious users must be considered. To address this issue, we studied digital watermarking methods that correlate data users with anonymized data. Our previous method embedded information indicating the original user to detect illegally republished data. However, this method did not focus on information loss. This study proposes another digital watermarking method for anonymized data that achieves low information loss. The proposed method replaces values in tuples to embed information. To reduce the information loss caused by the embedding, the proposed method selects replacement values from the candidates whose meanings are similar to the original. We propose the use of vector-conversion tables to select replacement values. The proposed method also extends the maximum length of the embedded bit string by embedding multiple bits into a single tuple. Moreover, we measured the tolerance to distortion attacks to evaluate the efficacy of the proposed method. The proposed method is non-blind, i.e., data prior to digital watermarking is required to perform extraction.

Highlights

  • Driven by the accelerating development of cloud computing and mobile technology, massive amounts of data have been collected and stored by various organizations such as companies, medical institutions, and governments

  • This study proposes a digital watermarking method for anonymized data that enables the identification of the source of illegal data republications

  • In this study, we proposed a digital watermarking method that enables the embedding of bit strings into anonymized data

Read more

Summary

INTRODUCTION

Driven by the accelerating development of cloud computing and mobile technology, massive amounts of data have been collected and stored by various organizations such as companies, medical institutions, and governments. Schrittwieser et al proposed a method that generates different anonymized data for each data user [12] This method expresses digital watermarks by changing the abstraction level of the values for anonymization. This study proposes a digital watermarking method for anonymized data that enables the identification of the source of illegal data republications This purpose is the same as in our previous method, the embedding technique of the proposed method is considerably updated to suppress information loss while enhancing tolerance to distortion attacks. The proposed method enables verification of the source of illegal or unauthorized data redistributions, i.e., the originally receiving data user, by embedding information of each specific data user into the data as a digital watermark when publishing. Note that we do not propose privacy-preserving methods such as anonymization ; privacy protections against attacks are beyond the scope of this study

RELATED TECHNIQUES
TARDOS FINGERPRINTING CODES
EVALUATION
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.