Privacy Preserving Data Mining Research Articles

Sharing data across hospitals for disease modeling is challenging due to concerns over patient privacy and the lack of an efficient privacy-preserving data mining framework. Contextual embedding models, which encode medical events into vector representations while preserving the contextual dependencies between events, have shown promise in privacy-preserving data mining without requiring original data disclosure. However, the medical event representations learned from multiple data sources lie in different embedding spaces and cannot be directly integrated. Existing embedding harmonization algorithms require a list of common medical events between different data sources and use them as corresponding pairs for transformation, known as the supervised harmonization method. However, common medical events can be difficult to collect in clinical practice. To promote data mining across hospitals, we developed a novel unsupervised embedding harmonization system that introduces an unsupervised harmonization algorithm to align contextual embeddings without the need for corresponding pairs. The proposed framework also considered different contextual embedding techniques, including Word2Vec and Med2Vec, to explore the robustness of the proposed unsupervised harmonization algorithm. The proposed framework was evaluated using medical events extracted from the Medical Information Mart for Intensive Care III database. By integrating the embeddings from multiple sources, the proposed framework can achieve better disease prediction accuracy and medical event clustering compared to models built on a single data source. The proposed unsupervised harmonization method, which achieves similar performance to the supervised harmonization model under different contextual embedding techniques, holds great promise for predictive modeling and event clustering in healthcare.

Read full abstract

The exponential growth of big data across industries presents both opportunities and challenges, particularly regarding the protection of sensitive information while maintaining data utility. The problem lies in balancing privacy preservation with the ability to extract meaningful insights from large datasets, which are often vulnerable to re-identification, breaches, and misuse. Current privacy-preserving data mining (PPDM) techniques, such as anonymization, differential privacy, and cryptographic methods, provide important solutions but introduce trade-offs in terms of data utility, computational performance, and compliance with privacy regulations. The objective of this study is to evaluate these PPDM methods, focusing on their effectiveness in safeguarding privacy while minimizing the impact on data accuracy and system performance. Additionally, the study seeks to assess the compliance of these methods with legal frameworks such as GDPR and HIPAA, which impose strict data protection requirements. By conducting an exhaustive analysis with regard to privacy-utility trade-offs, computation times, and communication complexities, this work attempts to outline the respective strengths and weaknesses of each method. Since these results can be elicited from the fact that indeed anonymization techniques contribute more to data utility by reducing the risk of re-identification, whereas differential privacy guarantees a high privacy at the cost of accuracy due to the introduction of noise in data through a privacy budget epsilon. Other cryptographic techniques, like homomorphic encryption and secure multiparty computation, are computationally expensive and hard to scale but offer strong security. In that respect, this work concludes that these techniques protect privacy with great efficiency; however, a number of privacy-data usability and performance trade-offs need to be performed. Future research should be focused on enhancing the scalability and efficiency of these methods toward fulfilling the needs of real-time big data analytics applications without loss of privacy.

Read full abstract

Privacy Preserving Data Mining Research Articles

Related Topics

Articles published on Privacy Preserving Data Mining

Big Data and Privacy with a focus on Statistical Approaches to Ensuring Data Confidentiality

New approach for efficient malicious multiparty private set intersection

An end-to-end knowledge graph solution to the frequent itemset hiding problem

Retraction Note: An improved privacy-preserving data mining technique using singular value decomposition with three-dimensional rotation data perturbation

Privacy preserving rare itemset mining

Efficient Privacy-preserving Logistic Model With Malicious Security.

Exploring Privacy-Preserving Methods via Perturbation Data Mining Employing Diverse Noise Strategies

Performance analysis of perturbation-based privacy preserving techniques: an experimental perspective

Effectiveness and Limitations of existing techniques for privacy preservation data mining (PPDM) for Medical Data

An unsupervised embedding harmonization system for privacy-preserving data mining in healthcare

An edge-aided parallel evolutionary privacy-preserving algorithm for Internet of Things

Hiding Sensitive Medical Data Using Simple and Pre-Large Rain Optimization Algorithm through Data Removal for E-Health System

STIF: Intuitionistic fuzzy Gaussian membership function with statistical transformation weight of evidence and information value for private information preservation.

Forecast of seasonal consumption behavior of consumers and privacy-preserving data mining with new S-Apriori algorithm

Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review

On the Inverse Frequent Itemset Mining Problem for Condensed Representations of Itemsets

Privacy-Preserving Data Mining Techniques in Big Data: Balancing Security and Usability

Privacy-Preserving Data Mining and Analytics in Big Data

An Adaptive Privacy Preserving Framework for Distributed Association Rule Mining in Healthcare Databases

STUDY OF BIG DATA BASED PROBLEMS FOR DATA ANONYMIZATION

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Privacy Preserving Data Mining Research Articles

Related Topics

Articles published on Privacy Preserving Data Mining

Big Data and Privacy with a focus on Statistical Approaches to Ensuring Data Confidentiality

New approach for efficient malicious multiparty private set intersection

An end-to-end knowledge graph solution to the frequent itemset hiding problem

Retraction Note: An improved privacy-preserving data mining technique using singular value decomposition with three-dimensional rotation data perturbation

Privacy preserving rare itemset mining

Efficient Privacy-preserving Logistic Model With Malicious Security.

Exploring Privacy-Preserving Methods via Perturbation Data Mining Employing Diverse Noise Strategies

Performance analysis of perturbation-based privacy preserving techniques: an experimental perspective

Effectiveness and Limitations of existing techniques for privacy preservation data mining (PPDM) for Medical Data

An unsupervised embedding harmonization system for privacy-preserving data mining in healthcare

An edge-aided parallel evolutionary privacy-preserving algorithm for Internet of Things

Hiding Sensitive Medical Data Using Simple and Pre-Large Rain Optimization Algorithm through Data Removal for E-Health System

STIF: Intuitionistic fuzzy Gaussian membership function with statistical transformation weight of evidence and information value for private information preservation.

Forecast of seasonal consumption behavior of consumers and privacy-preserving data mining with new S-Apriori algorithm

Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review

On the Inverse Frequent Itemset Mining Problem for Condensed Representations of Itemsets

Privacy-Preserving Data Mining Techniques in Big Data: Balancing Security and Usability

Privacy-Preserving Data Mining and Analytics in Big Data

An Adaptive Privacy Preserving Framework for Distributed Association Rule Mining in Healthcare Databases

STUDY OF BIG DATA BASED PROBLEMS FOR DATA ANONYMIZATION