Machine learning has emerged as a core technology in domains such as big data, the Internet of Things (IoT), and cloud computing. The training of machine learning models typically requires extensive datasets, often gathered through crowdsourcing methods. These datasets frequently contain significant amounts of private information, including personally identifiable information (e.g., phone numbers, identification numbers) and sensitive data (e.g., financial, medical, and health records). The efficient and cost-effective protection of such data represents a pressing challenge. This article introduces machine learning and explores the concepts and threats associated with privacy within this context. It focuses on mainstream techniques for privacy protection in machine learning, outlining their underlying mechanisms and distinctive features. The discussion is organised around key frameworks such as differential privacy, homomorphic encryption, and secure multi-party computation, presenting a comprehensive review of recent advancements in the field. A comparative analysis of the strengths and limitations of these mechanisms is provided. Finally, the article examines the developmental trajectory of privacy protection in machine learning and proposes potential future research directions in this critical area.
Read full abstract