Systematic Literature Review on Differential Privacy in Machine Learning

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

With the rapid advancement of Machine Learning (ML) and its widespread applications in various domains, concerns over data privacy and security have become increasingly critical. Differential Privacy (DP) has emerged as a rigorous mathematical framework for privacy-preserving data analysis in ML systems, offering formal guarantees for protecting individual privacy while enabling meaningful learning. Previous surveys have lacked extensive coverage of DP and ML, failing to address the trade-offs between privacy and accuracy. Consequently, achieving a comprehensive understanding of the design, implementation, and efficiency of the DP algorithms within the ML domain is imperative. This survey provides a systematic review of DP methods across ML approaches, including traditional ML, federated learning, and deep learning. Through a thematic analysis of 106 studies, we identify key DP implementation strategies, examine their impact on model performance, and highlight the advantages and limitations of existing approaches. Our findings offer practical insights to assist researchers and practitioners in selecting appropriate DP mechanisms based on specific requirements. Finally, we discuss open challenges and future research directions to advance DP techniques for improved privacy-utility trade-offs in ML applications.

Similar Papers
  • Front Matter
  • Cite Count Icon 62
  • 10.1002/aps3.11371
Plants meet machines: Prospects in machine learning for plant biology
  • Jun 1, 2020
  • Applications in Plant Sciences
  • Pamela S Soltis + 3 more

Plants meet machines: Prospects in machine learning for plant biology

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 80
  • 10.1016/j.trechm.2020.10.007
Chemist versus Machine: Traditional Knowledge versus Machine Learning Techniques
  • Nov 9, 2020
  • Trends in Chemistry
  • Janine George + 1 more

Chemical heuristics have been fundamental to the advancement of chemistry and materials science. These heuristics are typically established by scientists using knowledge and creativity to extract patterns from limited datasets. Machine learning offers opportunities to perfect this approach using computers and larger datasets. Here, we discuss the relationships between traditional heuristics and machine learning approaches. We show how traditional rules can be challenged by large-scale statistical assessment and how traditional concepts commonly used as features are feeding the machine learning techniques. We stress the waste involved in relearning chemical rules and the challenges in terms of data size requirements for purely data-driven approaches. Our view is that heuristic and machine learning approaches are at their best when they work together.

  • Research Article
  • Cite Count Icon 43
  • 10.1213/ane.0000000000004656
Machine-Learning Implementation in Clinical Anesthesia: Opportunities and Challenges.
  • Jun 1, 2020
  • Anesthesia & Analgesia
  • Danton S Char + 1 more

Machine-Learning Implementation in Clinical Anesthesia: Opportunities and Challenges.

  • Research Article
  • Cite Count Icon 8
  • 10.1055/s-0043-1768731
Security and Privacy in Machine Learning for Health Systems: Strategies and Challenges.
  • Aug 1, 2023
  • Yearbook of Medical Informatics
  • Erikson J De Aguiar + 2 more

Machine learning (ML) is a powerful asset to support physicians in decision-making procedures, providing timely answers. However, ML for health systems can suffer from security attacks and privacy violations. This paper investigates studies of security and privacy in ML for health. We examine attacks, defenses, and privacy-preserving strategies, discussing their challenges. We conducted the following research protocol: starting a manual search, defining the search string, removing duplicated papers, filtering papers by title and abstract, then their full texts, and analyzing their contributions, including strategies and challenges. Finally, we collected and discussed 40 papers on attacks, defense, and privacy. Our findings identified the most employed strategies for each domain. We found trends in attacks, including universal adversarial perturbation (UAPs), generative adversarial network (GAN)-based attacks, and DeepFakes to generate malicious examples. Trends in defense are adversarial training, GAN-based strategies, and out-of-distribution (OOD) to identify and mitigate adversarial examples (AE). We found privacy-preserving strategies such as federated learning (FL), differential privacy, and combinations of strategies to enhance the FL. Challenges in privacy comprehend the development of attacks that bypass fine-tuning, defenses to calibrate models to improve their robustness, and privacy methods to enhance the FL strategy. In conclusion, it is critical to explore security and privacy in ML for health, because it has grown risks and open vulnerabilities. Our study presents strategies and challenges to guide research to investigate issues about security and privacy in ML applied to health systems.

  • Research Article
  • Cite Count Icon 1
  • 10.62051/f2kew975
Noise Addition Strategies for Differential Privacy in Stochastic Gradient Descent
  • Aug 12, 2024
  • Transactions on Computer Science and Intelligent Systems Research
  • Kangjie Lu

Differential privacy technology is more and more widely used in the field of machine learning, especially in the gradient descent algorithm (SGD). Protecting data privacy by adding noise has become a hot topic of research. This paper reviews the noise addition strategy of differential privacy SGD from multiple dimensions, including adjustment based on noise distribution, adjustment based on gradient norm, adjustment based on privacy budget, and method based on model architecture. Each strategy has different performances in terms of privacy protection level, model performance loss and computational complexity. This article compares and analyzes these differences in detail, aiming to provide valuable reference for researchers and practitioners. This article also discusses how to combine federal learning and differential privacy technology to protect data privacy more efficiently in a secure multi-party computing (MPC) environment. Through the review of this article, we can see the wide application of differential privacy in machine learning and deep learning and its importance in the field of privacy protection. At the same time, we also show the direction and challenges of future research.

  • Supplementary Content
  • Cite Count Icon 4
  • 10.3390/s25134207
Human-Centric Cognitive State Recognition Using Physiological Signals: A Systematic Review of Machine Learning Strategies Across Application Domains
  • Jul 5, 2025
  • Sensors (Basel, Switzerland)
  • Kaizhe Jin + 5 more

This systematic review analyses advancements in cognitive state recognition from 2010 to early 2024, evaluating 405 relevant articles from an initial pool of 2398 records identified through five databases: Scopus, Engineering Village, Web of Science, IEEE Xplore, and PubMed. Studies were included if they assessed cognitive states using physiological signals and applied machine learning (ML) or deep learning (DL) techniques in practical task settings. The review highlights a pivotal shift from shallow ML to DL approaches for analysing physiological signals, driven by DL’s ability to autonomously learn complex patterns in large datasets. By 2023, DL has become the dominant methodology, though traditional ML techniques remain relevant. Additionally, there has been a move from neuroimaging to multimodal physiological modalities, with the decrease in neuroimaging use reflecting a trend towards integrating various physiological signals for more comprehensive insights. Cognitive state recognition is applied across diverse domains such as the automotive, aviation, maritime, and healthcare industries, enhancing performance and safety in high-stakes environments. Electrocardiogram (ECG) is the most utilised modality, with convolutional neural networks (CNNs) being the primary DL approach. The trend in cognitive state recognition research is moving towards integrating ECG signals with CNNs and adopting privacy-preserving methodologies like differential privacy and federated learning, highlighting the potential of cognitive state recognition to enhance performance, safety, and innovation across various real-world applications.

  • Research Article
  • Cite Count Icon 22
  • 10.1097/corr.0000000000001679
CORR Synthesis: When Should the Orthopaedic Surgeon Use Artificial Intelligence, Machine Learning, and Deep Learning?
  • Feb 17, 2021
  • Clinical orthopaedics and related research
  • Michael P Murphy + 1 more

CORR Synthesis: When Should the Orthopaedic Surgeon Use Artificial Intelligence, Machine Learning, and Deep Learning?

  • Research Article
  • Cite Count Icon 3
  • 10.11999/jeit190887
Cryptographic Approaches for Privacy-Preserving Machine Learning
  • Jun 4, 2020
  • 电子与信息学报
  • Hua Jiang + 5 more

The characteristics of the new generation of artificial intelligence technology are shown as follows: with the help of GPU computing, cloud computing and other high-performance distributed computing capabilities, machine learning algorithms represented by deep learning algorithms are used for learning and training on big data to simulate, extend and expand human intelligence. Different data sources and computing physical locations make the current machine learning face serious privacy leakage problem, so the Privacy Protection of Machine (PPM) Learning has become a widely concerned research area. Using cryptography technology to solve the problem of privacy in machine learning is an important technology to protect the privacy of machine learning. Cryptographic tools used in privacy-preserving machine learning are introduced, such as general Secure Multi-Party Computing (SMPC), privacy protection set operation and Homomorphic Encryption (HE), describes the status and developments applying the tools to solving the problems of privacy protection in various stages of machine learning, such as data processing, model training, model testing, and data prediction.

  • Research Article
  • Cite Count Icon 9
  • 10.1038/s41598-024-83564-4
Design of an improved model using federated learning and LSTM autoencoders for secure and transparent blockchain network transactions
  • Jan 10, 2025
  • Scientific Reports
  • R Vijay Anand + 8 more

With the advancement of this digital era and the emergence of DApps and Blockchain, secure, robust and transparent network transaction has become invaluable today. These traditional methods of securing the transactions and maintaining transparency have encountered many challenges. It includes some such issues as follows: data privacy, centralized vulnerability, inefficiency in fraud detection and much more. To that effect, and to address such limitations, this paper provides a blockchain technology framework that is driven by advanced machine learning techniques, which will enhance security and transparency throughout the network of transactions. We begin with a design framework based on Federated Learning for Blockchain Integration where distributed datasets across blockchain nodes contribute to a global machine learning model but do not share raw data samples. Different nodes learn their own models. After that, these local models are aggregated towards a common, global model using secure aggregation methods, which makes sure that there is nozza of data privacy and hence, in the process making sure that more accurate models can be obtained due to diversified data sets. With LSTMs Autoencoders, more excellent security protocols are created for anomaly detection and fraud. So, by training the autoencoder on normal transaction data, the system can alert transactions with high reconstruction errors, meaning real-time anomalies. This proactive detection of anomalies reduces fraudulent activities significantly as most of the threats are recognized early. To this end, this paper proposes Smart Contract-based Model Management for machine learning models in a decentralized environment. Smart contracts are responsible for the submission, validation, and execution of the locally updated models in a decentralized fashion such that the management process is transparent and tamper resistant. Integrity and authenticity requirements are fulfilled by enforcing consensus mechanisms. Privacy in Machine Learning is guaranteed through Differential Privacy and Homomorphic Encryption. Differential privacy techniques, so as to ensure individual transaction data privacy in the updates of the local model before aggregation. In homomorphic encryption, computations are made in the encrypted form so when forming privacy preserving global model, privacy is preserved. The Real-time analysis of the transactions can be done with CNNs to detect fraud. Streaming transaction data is analyzed by CNNs leveraging the privacy-preserving global model and producing immediate alerts and actions for detected fraud. This real timing makes the network even more reliable and trustworthy. Our proposed framework is effective according to the interim outcomes where the aggregation of local models occurred without data leakage, detected anomalies very efficiently, managed models very transparently, with privacy of data at a very high level, and easily detected fraudulent transactions. The work presented here provides a great boost to send secure and very easily transparent transactions across the network, and thus resulted in enhanced network trust and decentralization.

  • Research Article
  • Cite Count Icon 129
  • 10.1613/jair.1.14649
How to DP-fy ML: A Practical Guide to Machine Learning with Differential Privacy
  • Jul 23, 2023
  • Journal of Artificial Intelligence Research
  • Natalia Ponomareva + 8 more

Machine Learning (ML) models are ubiquitous in real-world applications and are a constant focus of research. Modern ML models have become more complex, deeper, and harder to reason about. At the same time, the community has started to realize the importance of protecting the privacy of the training data that goes into these models. Differential Privacy (DP) has become a gold standard for making formal statements about data anonymization. However, while some adoption of DP has happened in industry, attempts to apply DP to real world complex ML models are still few and far between. The adoption of DP is hindered by limited practical guidance of what DP protection entails, what privacy guarantees to aim for, and the difficulty of achieving good privacy-utility-computation trade-offs for ML models. Tricks for tuning and maximizing performance are scattered among papers or stored in the heads of practitioners, particularly with respect to the challenging task of hyperparameter tuning. Furthermore, the literature seems to present conflicting evidence on how and whether to apply architectural adjustments and which components are “safe” to use with DP. In this survey paper, we attempt to create a self-contained guide that gives an in-depth overview of the field of DP ML. We aim to assemble information about achieving the best possible DP ML model with rigorous privacy guarantees. Our target audience is both researchers and practitioners. Researchers interested in DP for ML will benefit from a clear overview of current advances and areas for improvement. We also include theory-focused sections that highlight important topics such as privacy accounting and convergence. For a practitioner, this survey provides a background in DP theory and a clear step-by-step guide for choosing an appropriate privacy definition and approach, implementing DP training, potentially updating the model architecture, and tuning hyperparameters. For both researchers and practitioners, consistently and fully reporting privacy guarantees is critical, so we propose a set of specific best practices for stating guarantees. With sufficient computation and a sufficiently large training set or supplemental nonprivate data, both good accuracy (that is, almost as good as a non-private model) and good privacy can often be achievable. And even when computation and dataset size are limited, there are advantages to training with even a weak (but still finite) formal DP guarantee. Hence, we hope this work will facilitate more widespread deployments of DP ML models.

  • Research Article
  • Cite Count Icon 6
  • 10.14778/3484224.3484231
Quantifying identifiability to choose and audit ϵ in differentially private deep learning
  • Sep 1, 2021
  • Proceedings of the VLDB Endowment
  • Daniel Bernau + 4 more

Differential privacy allows bounding the influence that training data records have on a machine learning model. To use differential privacy in machine learning, data scientists must choose privacy parameters (ϵ, δ ). Choosing meaningful privacy parameters is key, since models trained with weak privacy parameters might result in excessive privacy leakage, while strong privacy parameters might overly degrade model utility. However, privacy parameter values are difficult to choose for two main reasons. First, the theoretical upper bound on privacy loss (ϵ, δ) might be loose, depending on the chosen sensitivity and data distribution of practical datasets. Second, legal requirements and societal norms for anonymization often refer to individual identifiability, to which (ϵ, δ ) are only indirectly related. We transform (ϵ, δ ) to a bound on the Bayesian posterior belief of the adversary assumed by differential privacy concerning the presence of any record in the training dataset. The bound holds for multidimensional queries under composition, and we show that it can be tight in practice. Furthermore, we derive an identifiability bound, which relates the adversary assumed in differential privacy to previous work on membership inference adversaries. We formulate an implementation of this differential privacy adversary that allows data scientists to audit model training and compute empirical identifiability scores and empirical (ϵ, δ ).

  • Research Article
  • 10.1016/j.ejro.2026.100747
Machine learning and deep learning models for predicting colorectal cancer metastases: A comprehensive review.
  • Jun 1, 2026
  • European journal of radiology open
  • Mikiyas Amare Getu + 6 more

Machine learning and deep learning models for predicting colorectal cancer metastases: A comprehensive review.

  • Preprint Article
  • Cite Count Icon 2
  • 10.5194/egusphere-egu2020-690
Are Machine Learning methods robust enough for hydrological modeling under changing conditions?
  • Jul 17, 2020
  • Carolina Natel De Moura + 3 more

<p>The advancement of big data and increased computational power have contributed to an increased use of Machine Learning (ML) approaches in hydrological modelling. These approaches are powerful tools for modeling non-linear systems. However, the applicability of ML in non-stationary conditions needs to be studied further. As climate change will change hydrological patterns, testing ML approaches for non-stationary conditions is essential. Here, we used the Differential Split-Sample Test (DSST) to test the climate transposability of ML approaches (e.g., calibrating in a wet period and validating in a dry one, and vice-versa).  We applied five ML approaches using daily precipitation and temperature as input for the prediction of the daily discharge in six snow-dominated Swiss catchments. Lower and upper benchmarks were used to evaluate performances through a relative performance measure. The lower benchmark is the average of the bucket-type HBV model runs from 1000 random parameter sets. The upper benchmark is the automatically calibrated HBV model. In comparison with the stationary condition, the models performed slightly poorer in the non-stationary condition. The performance of simple ML approaches was poor for non-stationary conditions with an underestimation of peak flows, as well as a poor representation of the snow-melting period. On the other hand, a more complex ML approach (deep learning), the Long Short -Term Memory (LSTM), showed a good performance when compared with the lower and upper benchmarks. This might be explained by the fact that the so-called memory cell allowed to simulate the storage effects. </p>

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 46
  • 10.1038/s41598-020-69433-w
Novel application of an automated-machine learning development tool for predicting burn sepsis: proof of concept
  • Jul 23, 2020
  • Scientific Reports
  • Nam K Tran + 7 more

Sepsis is the primary cause of burn-related mortality and morbidity. Traditional indicators of sepsis exhibit poor performance when used in this unique population due to their underlying hypermetabolic and inflammatory response following burn injury. To address this challenge, we developed the Machine Intelligence Learning Optimizer (MILO), an automated machine learning (ML) platform, to automatically produce ML models for predicting burn sepsis. We conducted a retrospective analysis of 211 adult patients (age ≥ 18 years) with severe burn injury (≥ 20% total body surface area) to generate training and test datasets for ML applications. The MILO approach was compared against an exhaustive “non-automated” ML approach as well as standard statistical methods. For this study, traditional multivariate logistic regression (LR) identified seven predictors of burn sepsis when controlled for age and burn size (OR 2.8, 95% CI 1.99–4.04, P = 0.032). The area under the ROC (ROC-AUC) when using these seven predictors was 0.88. Next, the non-automated ML approach produced an optimal model based on LR using 16 out of the 23 features from the study dataset. Model accuracy was 86% with ROC-AUC of 0.96. In contrast, MILO identified a k-nearest neighbor-based model using only five features to be the best performer with an accuracy of 90% and a ROC-AUC of 0.96. Machine learning augments burn sepsis prediction. MILO identified models more quickly, with less required features, and found to be analytically superior to traditional ML approaches. Future studies are needed to clinically validate the performance of MILO-derived ML models for sepsis prediction.

  • Research Article
  • Cite Count Icon 5
  • 10.30574/wjaets.2024.12.1.0057
Advanced frameworks for fraud detection leveraging quantum machine learning and data science in fintech ecosystems
  • Jun 30, 2024
  • World Journal of Advanced Engineering Technology and Sciences
  • Temitope Oluwatosin Fatunmbi

The rapid expansion of the fintech sector has brought with it an increasing demand for robust and sophisticated fraud detection systems capable of managing large volumes of financial transactions. Conventional machine learning (ML) approaches, while effective, often encounter limitations in terms of computational efficiency and the ability to model complex, high-dimensional data structures. Recent advancements in quantum computing have given rise to a promising paradigm known as quantum machine learning (QML), which leverages quantum mechanical principles to solve problems that are computationally infeasible for classical computers. The integration of QML with data science has opened new avenues for enhancing fraud detection frameworks by improving the accuracy and speed of transaction pattern analysis, anomaly detection, and risk mitigation strategies within fintech ecosystems. This paper aims to explore the potential of quantum-enhanced data science methodologies to bolster fraud detection and prevention mechanisms, providing a comparative analysis of QML techniques against classical ML models in the context of their application to financial data analysis. Fraud detection in fintech relies heavily on data-driven models to identify suspicious activities and prevent financial crimes such as identity theft, money laundering, and fraudulent transactions. Traditional ML approaches, such as decision trees, support vector machines, and deep learning, have laid the foundation for these systems. However, these approaches often fall short when faced with the challenges posed by high-dimensional, noisy, and complex financial data. Quantum machine learning, by leveraging quantum bits or qubits, possesses the unique ability to represent and process data in an exponentially larger state space, allowing for more efficient pattern recognition and computationally intensive analysis. Quantum algorithms such as the Quantum Support Vector Machine (QSVM), Quantum Principal Component Analysis (QPCA), and Quantum Neural Networks (QNNs) have been studied for their potential to outperform classical counterparts in specific problem domains, including fraud detection. This research delves into the theoretical foundations of quantum computing, outlining how quantum superposition, entanglement, and quantum interference can be harnessed to perform operations that exponentially accelerate data processing. Quantum algorithms are presented as capable of achieving faster data transformations and more nuanced pattern recognition through their ability to process all potential combinations of data simultaneously. The implementation of QML algorithms on quantum hardware, although still in its nascent stages, is beginning to demonstrate tangible benefits in terms of the speed and complexity of computations for fraud detection tasks. For example, quantum-enhanced anomaly detection can lead to the identification of rare, complex patterns that classical ML might overlook, contributing to a more proactive approach to fraud prevention. The paper also examines the integration of data science techniques with quantum-enhanced fraud detection, considering data preprocessing, feature engineering, and the application of quantum-enhanced statistical methods. Data preprocessing, a crucial step in building effective fraud detection models, involves the transformation and normalization of financial data to ensure that models can learn from relevant features without overfitting or underfitting. Quantum data structures offer the potential to represent data with a higher degree of complexity and interrelations, which is critical for capturing the multifaceted nature of financial transactions and detecting subtle signs of fraudulent activity. Quantum data encoding schemes such as Quantum Random Access Memory (QRAM) enable efficient storage and retrieval of data, providing a scalable solution for processing large datasets in real-time. A comprehensive analysis of case studies demonstrates the real-world applicability of quantum machine learning frameworks in fintech. The research highlights projects where quantum algorithms have been tested in controlled environments to detect anomalies in simulated transaction data, showcasing improvements in the identification of complex fraud scenarios over classical ML approaches. For instance, Quantum Support Vector Machines have been utilized to perform higher-dimensional classification tasks that are essential for distinguishing between legitimate and fraudulent transactions based on transaction history and user behavior. Furthermore, quantum algorithms that operate on hybrid systems, combining quantum and classical resources, are also explored to mitigate the limitations imposed by current quantum hardware, which is still constrained by issues such as noise and qubit coherence time. The paper also addresses key challenges and limitations associated with the integration of QML into practical fraud detection systems. Quantum hardware, although advancing rapidly, still faces significant challenges, including the need for error correction, qubit stability, and hardware scalability. Quantum computers with sufficient qubits and coherence time are necessary to implement complex algorithms for fraud detection effectively. Additionally, a practical approach to harnessing QML would require the development of quantum software frameworks and quantum programming languages that can operate in tandem with existing fintech systems and data infrastructure. Another area of focus is the synergy between quantum machine learning and classical machine learning models in creating hybrid systems that leverage the strengths of both methodologies. Quantum-enhanced feature extraction and dimensionality reduction can be combined with classical algorithms for final decision-making processes. This allows for a more comprehensive approach where quantum algorithms handle the computationally intensive parts of data analysis, while classical systems can be utilized for integrating real-time data and refining output for human interpretation. The paper discusses potential pathways for integrating these hybrid models, including considerations for API development, data interoperability, and the standardization of quantum-classical workflows. The discussion extends to the practical implications of implementing quantum-based fraud detection systems, particularly in terms of security and privacy. The use of quantum encryption and quantum key distribution can complement QML by ensuring that the data fed into fraud detection models is protected from external tampering. Quantum-resistant cryptography solutions are also explored, providing a comprehensive view of how quantum technologies could enhance the overall security posture of fintech ecosystems while promoting trust and compliance.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant