A systematic review on privacy-preserving distributed data mining

Chang Sun,Lianne Ippel,Michel Dumontier,Johan Van Soest,Andre Dekker,Karin Verspoor

doi:10.3233/ds-210036

Abstract

Combining and analysing sensitive data from multiple sources offers considerable potential for knowledge discovery. However, there are a number of issues that pose problems for such analyses, including technical barriers, privacy restrictions, security concerns, and trust issues. Privacy-preserving distributed data mining techniques (PPDDM) aim to overcome these challenges by extracting knowledge from partitioned data while minimizing the release of sensitive information. This paper reports the results and findings of a systematic review of PPDDM techniques from 231 scientific articles published in the past 20 years. We summarize the state of the art, compare the problems they address, and identify the outstanding challenges in the field. This review identifies the consequence of the lack of standard criteria to evaluate new PPDDM methods and proposes comprehensive evaluation criteria with 10 key factors. We discuss the ambiguous definitions of privacy and confusion between privacy and security in the field, and provide suggestions of how to make a clear and applicable privacy description for new PPDDM techniques. The findings from our review enhance the understanding of the challenges of applying theoretical PPDDM methods to real-life use cases, and the importance of involving legal-ethical and social experts in implementing PPDDM methods. This comprehensive review will serve as a helpful guide to past research and future opportunities in the area of PPDDM.

Highlights

Mining distributed, sensitive data offers tantalising potential for new insights and a wide variety of applications, but is generally fraught with concerns of model accuracy and data privacy
This review identifies the consequence of the lack of standard criteria to evaluate new Privacy-preserving distributed data mining techniques (PPDDM) methods and proposes comprehensive evaluation criteria with 10 key factors
This review presented a comprehensive overview of current PPDDM methods to help researchers better understand the development of this domain and assist practitioners to select the suitable solutions for their practical cases

Summary

Introduction

Sensitive data offers tantalising potential for new insights and a wide variety of applications, but is generally fraught with concerns of model accuracy and data privacy. Consider the case of analyzing patient data in the healthcare domain: hospitals have used patient data to improve diagnostic accuracy and efficiency [29,31] and to fuel the transition to preventive [17] and precision medicine [6,27,95]. Combining various patient data from multiple sources offers one pathway to obtain more accurate and reliable analytical models for health outcomes [3,97]. Combining distributed sensitive data faces a number of challenges including: data protection compliance to one or more legal jurisdictions, privacy concerns, security, and trust issues. The number of participating parties and if all parties or only some parties have the target class should be covered by this section

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Data Science	Publication Date: Oct 13, 2021
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A systematic review on privacy-preserving distributed data mining

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Science

Lead the way for us

Similar Papers

Performance analysis of privacy preserving distributed data mining based on cryptographic techniques
Venkatesh Kumar Marimuthu ... C Lakshmi
-
Venkatesh Kumar Marimuthu, et. al.Venkatesh Kumar Marimuthu ... C Lakshmi
11 Feb 2021
11 Feb 2021

Research on the Personalized Privacy Preserving Distributed Data Mining
Yanguang Shen ... Yan Li
-
Yanguang Shen, et. al.Yanguang Shen ... Yan Li
01 Dec 2009
01 Dec 2009

Privacy preserving distributed data mining based on secure multi-party computation
Jun Liu ... Nirwan Ansari
Computer Communications | VOL. 153
Jun Liu, et. al.Jun Liu ... Nirwan Ansari
08 Feb 2020
Computer Communications | VOL. 153

Privacy Preserving Distributed Data Mining Based on Secure Multi-party Computation
Yu Zhou ... Yuan Tian
-
Yu Zhou, et. al.Yu Zhou ... Yuan Tian
01 Oct 2019
01 Oct 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A systematic review on privacy-preserving distributed data mining

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Data Science