Abstract

Protection of personal data in statistical databases has recently become a major societal concern. Statistical disclosure control (SDC) is often applied to statistical databases before they are released for public use. Microaggregation for SDC is a family of methods to protect microdata (i.e., records on individuals and/or companies) from individual identification. Microaggregation works by partitioning the microdata into groups of at least k records and, then, replacing the records in each group with the centroid of the group. An optimal microaggregation method must minimize the information loss resulting from this replacement process. However, this problem of minimizing information loss has been shown to be NP-hard for multivariate data. Methods based on various heuristics have been proposed for this problem, but none performs the best for every microdata set and various k values. This work presents a density-based algorithm (DBA) for microaggregation. The DBA first forms groups of records by the descending order of their densities, then fine-tunes these groups in reverse order. The performance of the DBA is compared against the latest microaggregation methods. Experimental results indicate that DBA incurs the least information loss for over half of the test situations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.