Abstract

Privacy-preserving data mining (PPDM) has become an interesting and emerging topic in recent years because it helps hide confidential information, while allowing useful knowledge to be discovered at the same time. Data sanitization is a common way to perturb a database, and thus sensitive or confidential information can be hidden. PPDM is not a trivial task and can be concerned an Non-deterministic Polynomial-time (NP)-hard problem. Many algorithms have been studied to derive optimal solutions using the evolutionary process, although most are based on straightforward or single-objective methods used to discover the candidate transactions/items for sanitization. In this paper, we present a multi-objective algorithm using a grid-based method (called GMPSO) to find optimal solutions as candidates for sanitization. The designed GMPSO uses two strategies for updating gbest and pbest during the evolutionary process. Moreover, the pre-large concept is adapted herein to speed up the evolutionary process, and thus multiple database scans during each evolutionary process can be reduced. From the designed GMPSO, multiple Pareto solutions rather than single-objective algorithms can be derived based on Pareto dominance. In addition, the side effects of the sanitization process can be significantly reduced. Experiments have shown that the designed GMPSO achieves better side effects than the previous single-objective algorithm and the NSGA-II-based approach, and the pre-large concept can also help with speeding up the computational cost compared to the NSGA-II-based algorithm.

Highlights

  • Data mining, called knowledge discovery in databases [1,2,3,4,5,6], is used to find the useful and meaningful information for further decision-making, and can be utilized in many domains and applications, including basket analytics, DNA sequence analysis, or recommendations

  • The major contributions of the study are summarized below. This is the first work regarding the design of an multi-objective particle swarm optimization (MOPSO)-based framework in Privacy-preserving data mining (PPDM), which shows a better performance in terms of side effects compared to a conventional single-objective approach and the Non-dominated Sorting Genetic Algorithm (NSGA)-II-based model

  • Lin et al presented a meat-heuristic approach [23] based on the NSGA-II framework for data sanitization, which shows better side effects compared to single-objective algorithms, which is the state-of-the-art multiple-objective optimization algorithm in PPDM

Read more

Summary

A Grid-Based Swarm Intelligence Algorithm for Privacy-Preserving Data Mining

National Demonstration Center for Experimental Electronic Information and Electrical Technology. Education (Fujian University of Technology), Fujian University of Technology, No 33 Xuefunan Road, University Town, Fuzhou 350118, China. Received: 3 January 2019; Accepted: 18 February 2019; Published: 22 February 2019

Introduction
Evolutionary Computation
Privacy-Preserving Data Mining
Pre-Large Concept
Preliminary Information and Problem Statement
Proposed MOPSO-Based Framework for Data Sanitization
Data Processing
Evolution Process
An Illustrated Example
Experimental Results
Runtime
Side Effects
Scalability
Conclusions and Areas of Future Study
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.