Abstract
Feature selection is vital for data mining as each organization gathers a colossal measure of high dimensional microdata. Among significant standards of the algorithms for feature selection, the primary one which is currently considered as significant is feature selection stability along with accuracy. Privacy preserving data publishing methods with various delicate traits are analyzed to lessen the likelihood of adversaries to figure the touchy values. By and large, protecting the delicate values is typically accomplished by anonymizing data by utilizing generalization and suppression methods which may bring about information loss. Strategies other than generalization and suppression are investigated to diminish information loss. Privacy preserving data publishing with the overlapped slicing technique with various delicate ascribes tackles the issues in microdata with numerous touchy attributes. Feature selection stability is a vital criterion of data mining technique because of the accumulation of ever increasing dimensionality of microdata due to everyday activities on the World Wide Web. Feature selection stability is directly correlated with data utility. Feature selection stability is data centric and hence modifications of a dataset for privacy preservation affects feature selection stability along with data utility. As feature selection stability is data-driven, the impacts of privacy preserving data publishing based on overlapped slicing on feature selection stability and accuracy is investigated in this paper.
Highlights
There will be a huge amount of high-dimensional microdata created by organizations because of regular exercises on online business, e-administration, and so on
Salem Alelyani investigated the causes of instability in high-dimensional datasets using well-known feature selection algorithms and proved that feature selection stability mostly depends on data [5]
This research contribution gives an overview of feature selection stability and its importance in data mining
Summary
There will be a huge amount of high-dimensional microdata created by organizations because of regular exercises on online business, e-administration, and so on. In this data, each record has at least one delicate attribute and is independently elucidated in each record [2]. Feature selection is a significant dimensionality decrease method in data mining that chooses the subset of pertinent traits. Microdata publishing strategies including slicing, overlapped slicing, t-closeness, l-diversity, k-anonymity will bother the data to safeguard the privacy of data. This paper is connected with the effect on data utility along with feature selection stability in data mining by perturbation of dataset for the privacy preserving data publishing methods slicing and overlapped slicing strategies
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have