Hybrid Firefly Algorithm Harmony Search for Feature Selection with BCNF for Multiple Subtables and EM-GMM for Top Down Initial Partitioning

S Balamurugan,P Visalakshi

doi:10.5958/2249-7315.2016.00743.7

Abstract

Several anonymization techniques, such as generalization and bucketization, have been designed for privacy preserving microdata publishing. Recently it has become major importance in research areas and numerous approaches have been proposed to preserve privacy in literature. In order to solve this issue, recent work study the problem of privacy preservation hazard for mined Conditional Functional Dependency (CFD) against (d,l) inference model using Compact frequent pattern growth branch sort algorithm (CFPGBS) methods. Boyce–Codd normal Form (BCNF) method is presented for publishing multiple subtables and it is anonymized through (d, l)- inference model. But the major problem of the work is that initial partitioning is done without considering the importance of the feature, so it contains all quasi identifier features. It becomes more complex to perform initial partitioning if the data samples becomes large,thus may also reduce the privacy results. To overcome this issue in this paper presents a hybrid feature selection method which removes unimportant quasi identifier in the final Boyce–Codd normal Form (BCNF) table. The proposed hybrid feature selection method follows the procedure of hybrid firefly algorithm with harmony search (HFAHS) algorithm to select optimal feature in the table for initial partition of top down approach. For selected features expectation Maximization (EM) based Gaussian mixture distribution (EM-GMM) based top down initial partition is done to group the partition data from the result d,l -inference model along with changed publishing data from BCNF. Experimental results show that the proposed BCNF- d,l -inference model can adeptly anonymize the microdata with less information loss when compared to existing methods against CFD. The effectiveness and privacy results of proposed methods are also compared with existing conventional methods. Our workload experiments confirm that proposed preserves better utility, privacy than d,l -inference model against CFD attack and is more effective in workloads with selection of optimal quasi identifiers.

Full Text