The healthcare sector has changed dramatically in recent years due to depending more and more on big data to improve patient care, enhance or improve operational effectiveness, and forward medical research. Protecting patient privacy in the era of digital health records is a major challenge, as there could be a chance of privacy leakage during the process of collecting patient data. To overcome this issue, we propose a secure, privacy-preserving scheme for healthcare data to ensure maximum privacy of an individual while also maintaining their utility and allowing for the performance of queries based on sensitive attributes under differential privacy. We implemented differential privacy on two publicly available healthcare datasets, the Breast Cancer Prediction Dataset and the Nursing Home COVID-19 Dataset. Moreover, we examined the impact of varying privacy parameter (ε) values on both the privacy and utility of the data. A significant part of this study involved the selection of ε, which determines the degree of privacy protection. We also conducted a computational time comparison by performing multiple complex queries on these datasets to analyse the computational overhead introduced by differential privacy. The outcomes demonstrate that, despite a slight increase in query processing time, it remains within reasonable bounds, ensuring the practicality of differential privacy for real-time applications.
Read full abstract