Abstract

Public opinion is the belief or thoughts of the public regarding a particular topic, especially one regarding politics, religion or social issues. Opinions may be sensitive since they may reflect a person's perspective, understanding, particular feelings, way of life, and desires. On one hand, public opinion is often collected through a central server which keeps a user profile for each participant and needs to publish this data for research purposes. On the other hand, such publishing of sensitive information without proper de-identification puts individuals' privacy at risk, thus opinions must be anonymized prior to publishing. While many anonymization approaches for tabular data with single sensitive attribute have been introduced, the proposed approaches do not readily apply to opinion polls. This is because opinions are generally collected on many issues, thus opinion databases have multiple sensitive attributes. Finding and enforcing anonymization models that work on datasets with multiple sensitive attributes while allowing risk analysis on the publisher side is not a well-studied problem. In this work, we identify the privacy problems regarding public opinions and propose a new probabilistic privacy model MSA-diversity, specifically defined on datasets with multiple sensitive attributes. We also present a heuristic anonymization technique to enforce MSA-diversity. Experimental results on real data show that our approach clearly outperforms the existing approaches in terms of anonymization accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call