Privacy preserving defect prediction using generalization and entropy-based data reduction

Ahmad A Saifan,Zainab Lataifeh

doi:10.3233/ida-205504

Abstract

The software engineering community produces data that can be analyzed to enhance the quality of future software products, and data regarding software defects can be used by data scientists to create defect predictors. However, sharing such data raises privacy concerns, since sensitive software features are usually considered as business assets that should be protected in accordance with the law. Early research efforts on protecting the privacy of software data found that applying conventional data anonymization to mask sensitive attributes of software features degrades the quality of the shared data. In addition, data produced by such approaches is not immune to attacks such as inference and background knowledge attacks. This research proposes a new approach to share protected release of software defects data that can still be used in data science algorithms. We created a generalization (clustering)-based approach to anonymize sensitive software attributes. Tomek link and AllNN data reduction approaches were used to discard noisy records that may affect the usefulness of the shared data. The proposed approach considers diversity of sensitive attributes as an important factor to avoid inference and background knowledge attacks on the anonymized data, therefore data discarded is removed from both defective and non-defective records. We conducted experiments conducted on several benchmark software defect datasets, using both data quality and privacy measures to evaluate the proposed approach. Our findings showed that the proposed approach outperforms existing well-known techniques using accuracy and privacy measures.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Privacy preserving defect prediction using generalization and entropy-based data reduction

Abstract

Talk to us

Similar Papers

More From: Intelligent Data Analysis

Lead the way for us

Similar Papers

F-Slip: an efficient privacy-preserving data publishing framework for 1:M microdata with multiple sensitive attributes
J Jayapradha ... M Prakash
Soft Computing | VOL. 26
J Jayapradha, et. al.J Jayapradha ... M Prakash
06 Oct 2021
Soft Computing | VOL. 26

An improved l-diversity model for numerical sensitive attributes
Jianmin Han ... Juan Yu
-
Jianmin Han, et. al. Jianmin Han ... Juan Yu
01 Jan 2008
01 Jan 2008

Heap Bucketization Anonymity—An Efficient Privacy-Preserving Data Publishing Model for Multiple Sensitive Attributes
J Jayapradha ... Osamah Ibrahim Khalaf
IEEE Access | VOL. 10
J Jayapradha, et. al.J Jayapradha ... Osamah Ibrahim Khalaf
01 Jan 2021
IEEE Access | VOL. 10

(a, d)-Diversity: Privacy Protection Based on l-Diversity
Qian Wang ... Xiangling Shi
-
Qian Wang, et. al.Qian Wang ... Xiangling Shi
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Privacy preserving defect prediction using generalization and entropy-based data reduction

Abstract

Talk to us

Similar Papers

More From: Intelligent Data Analysis