Abstract

Feature selection is vital for data mining as each organization gathers a colossal measure of high dimensional microdata. Among significant standards of the algorithms for feature selection, the primary one which is currently considered as significant is feature selection stability along with accuracy. Privacy preserving data publishing methods with various delicate traits are analyzed to lessen the likelihood of adversaries to figure the touchy values. By and large, protecting the delicate values is typically accomplished by anonymizing data by utilizing generalization and suppression methods which may bring about information loss. Strategies other than generalization and suppression are investigated to diminish information loss. Privacy preserving data publishing with the overlapped slicing technique with various delicate ascribes tackles the issues in microdata with numerous touchy attributes. Feature selection stability is a vital criterion of data mining technique because of the accumulation of ever increasing dimensionality of microdata due to everyday activities on the World Wide Web. Feature selection stability is directly correlated with data utility. Feature selection stability is data centric and hence modifications of a dataset for privacy preservation affects feature selection stability along with data utility. As feature selection stability is data-driven, the impacts of privacy preserving data publishing based on overlapped slicing on feature selection stability and accuracy is investigated in this paper.

Highlights

  • There will be a huge amount of high-dimensional microdata created by organizations because of regular exercises on online business, e-administration, and so on

  • Salem Alelyani investigated the causes of instability in high-dimensional datasets using well-known feature selection algorithms and proved that feature selection stability mostly depends on data [5]

  • This research contribution gives an overview of feature selection stability and its importance in data mining

Read more

Summary

INTRODUCTION

There will be a huge amount of high-dimensional microdata created by organizations because of regular exercises on online business, e-administration, and so on. In this data, each record has at least one delicate attribute and is independently elucidated in each record [2]. Feature selection is a significant dimensionality decrease method in data mining that chooses the subset of pertinent traits. Microdata publishing strategies including slicing, overlapped slicing, t-closeness, l-diversity, k-anonymity will bother the data to safeguard the privacy of data. This paper is connected with the effect on data utility along with feature selection stability in data mining by perturbation of dataset for the privacy preserving data publishing methods slicing and overlapped slicing strategies

DATA PERSPECTIVE NATURE OF FEATURE SELECTION STABILITY
JACCARD INDEX
PRIVACY PRESERVING DATA PUBLISHING
PRIVACY PRESERVING APPROACHES
PRIVACY THREATS
MICRODATA PUBLISHING TECHNIQUES
Slicing
Overlapped Slicing
Methodology
Information Gain IG
Correlation-based Feature Selection CFS
Datasets Used
Experimental Results Analysis
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call