Machine learning is one of the most prevailing techniques in computer science, and it has been widely applied in image processing, natural language processing, pattern recognition, cybersecurity, and other fields. Regardless of successful applications of machine learning algorithms in many scenarios, e.g., facial recognition, malware detection, automatic driving, and intrusion detection, these algorithms and corresponding training data are vulnerable to a variety of security threats, inducing a significant performance decrease. Hence, it is vital to call for further attention regarding security threats and corresponding defensive techniques of machine learning, which motivates a comprehensive survey in this paper. Until now, researchers from academia and industry have found out many security threats against a variety of learning algorithms, including naive Bayes, logistic regression, decision tree, support vector machine (SVM), principle component analysis, clustering, and prevailing deep neural networks. Thus, we revisit existing security threats and give a systematic survey on them from two aspects, the training phase and the testing/inferring phase. After that, we categorize current defensive techniques of machine learning into four groups: security assessment mechanisms, countermeasures in the training phase, those in the testing or inferring phase, data security, and privacy. Finally, we provide five notable trends in the research on security threats and defensive techniques of machine learning, which are worth doing in-depth studies in future.
Read full abstract