Abstract
Data mining services require accurate input data for their results to be meaningful, but privacy concerns may influence users to provide spurious information. In order to preserve the privacy of the client in data mining process, a variety of techniques based on random perturbation of data records have been proposed recently. We focus on an improved distortion process that tries to enhance the accuracy by selectively modifying the list of items. The normal distortion procedure does not provide the flexibility of tuning the probability parameters for balancing privacy and accuracy parameters, and each item's presence/absence is modified with an equal probability. In improved distortion technique, frequent one item-sets, and nonfrequent one item-sets are modified with a different probabilities controlled by two probability parameters fp, nfp respectively. The owner of the data has a flexibility to tune these two probability parameters (fp and nfp) based on his/her requirement for privacy and accuracy. The experiments conducted on real time datasets confirmed that there is a significant increase in the accuracy at a very marginal cost in privacy.
Highlights
The problem of privacy-preserving data mining has become more important in recent years because of the increasing ability to store personal data about users, and the increasing sophistication of data mining algorithms to leverage this information
We have introduced a new framework for enforcing privacy in mining frequent patterns, which combines three advances for efficiently hiding restrictive rules: inverted les, one for indexing the transactions per item and a second for indexing the sensitive transactions per restrictive pattern; a transaction retrieval engine relying on Boolean queries for retrieving transaction IDs from the inverted file and combining the resulted lists; and a set of sanitizing algorithms
In the context of our framework, the integration of the inverted le and the transaction retrieval engine are essential to speed up the sanitization process
Summary
The problem of privacy-preserving data mining has become more important in recent years because of the increasing ability to store personal data about users, and the increasing sophistication of data mining algorithms to leverage this information. A number of techniques such as randomization and k-anonymity have been suggested in recent years in order to perform privacy-preserving data mining. Data products are released mainly to inform public or business policy, and research findings or public information. Securing these products against unauthorized access has been a long term goal of the database security research community and the government statistical agencies. Solutions to such a problem require combining several techniques and mechanisms. It may be the case that sensitive data items can be inferred from non-sensitive data through some
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have