Abstract

Releasing person-specific data could potentially reveal sensitive information of individuals. k-anonymization is a promising privacy protection mechanism in data publishing. Though substantial research has been conducted on k-anonymization and its extensions in recent years, few of them consider releasing data for a specific purpose of data analysis. This paper presents a practical data publishing framework for determining a generalized version of data that preserves both individual privacy and information usefulness for cluster analysis. Experiments on real-life data suggest that, by focusing on preserving cluster structure in the generalization process, the cluster quality is significantly better than the cluster quality on the generalized data without such focus. The major challenge of generalizing data for cluster analysis is the lack of class labels that could be used to guide the generalization process. Our approach converts the problem into the counterpart problem for classification analysis where class labels encode the cluster structure in the data, and presents a framework to evaluate the cluster quality on the generalized data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.