Abstract

Personally identifiable information (PII) continues to be used in predictive modeling by academic researchers and industry organizations. Most notably, the healthcare industry has been a popular testbed for innovative approaches from academia and institutions to address research using PII in predictive applications and synthetic data generation. The majority of these approaches that generate synthetic PII are based on actual data or obfuscating real data parts. Privacy leakage and ethical disclosure results continue to be among the largest issues that are difficult to avoid in synthetic PII generation techniques. In this analysis, we propose a novel method to generate synthetic, differential privacy data while avoiding the common pitfalls and capable of being leveraged broadly. Evidence is also shown that proves how our novel approach can maintain inference for modeling and potential risks tied to PII features. We conclude with a summarization of our findings and results and a short discussion on how using PII data may impact organizations interested in developing predictive applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.