In recent years, rare pattern mining has shown great vitality in some real-world fields, such as disease diagnosis, criminal behavior analysis, anomaly detection in networks, and so on. When data organizations publish or share information publicly, shared data can be at risk of leakage as data mining techniques may discover sensitive knowledge and information. To keep competitors from obtaining hidden information after processing the database, privacy-preserving data mining (PPDM) has been proposed and studied widely. However, most of the techniques in PPDM are applied to frequent pattern mining and cannot deal with the privacy protection problems in rare pattern mining, such as network vulnerability detection and abnormal medical data. To address this limitation, we introduce a privacy-preserving technique for rare pattern mining. In this paper, two novel algorithms named Longest Transaction-Minimum Item Number (LT-MIN) and Longest Transaction-Maximum Item Number (LT-MAX) are proposed to hide sensitive rare itemsets and return the sanitized database. These two algorithms succeed in hiding target itemsets while minimizing the side effects on the original database. What's more, they employ a projection mechanism to reduce the time spent scanning the database. Besides using the traditional evaluation criteria in PPDM, we also propose two additional similarity measures to evaluate the performance from the perspective of the itemsets and the structural integrity of the database. The experimental results indicate that the proposed algorithms can hide sensitive rare itemsets successfully and efficiently, and the evaluation methods used can become the evaluation criteria for privacy-preserving rare itemset mining (PPRIM).
Read full abstract