In light of the growing capabilities of Large Language Models (LLMs), there is an urgent need for effective methods to protect personal data in online texts. Existing anonymization methods often prove ineffective against complex LLM analysis algorithms, especially when processing sensitive information such as medical data. This research proposes an innovative approach to anonymization that combines k-anonymity and adversarial methods. Our approach aims to improve the efficiency and speed of anonymization while maintaining a high level of data protection. Experimental results on a dataset of 10,000 comments showed a 40% reduction in processing time (from 250 ms to 150 ms per comment) compared to traditional adversarial methods, a 5% improvement in medical data anonymization accuracy (from 90% to 95%), and a 7% improvement in data utility preservation (from 85% to 92%). Special attention is paid to the application of the method in the context of interaction with LLM-based chatbots and medical information processing. We conduct an experimental evaluation of our method, comparing it with existing industrial anonymizers on real and synthetic datasets. The results demonstrate significant improvements in both data utility preservation and privacy protection. Our method also takes into account GDPR requirements, setting a new standard in the field of data anonymization for AI interactions. This research offers a practical solution for protecting user privacy in the era of LLMs, especially in sensitive areas such as healthcare. Keywords: AI, data security, ML, LLM, privacy.
Read full abstract