Abstract

A support vector machine (SVM) is a simple but yet powerful classification technique widely used in various applications, such as handwritten digits classification and face recognition. However, as any linear classification algorithm, it is vulnerable to adversarial attacks on test/training data. The majority of machine learning attacks aim to corrupt online data to affect automated classification in a number of business applications. This work formulates a multistage game between an SVM and adversary. The SVM aims to maximize classification accuracy on a test dataset and includes a procedure for validating a training dataset, whereas the adversary aims to minimize the accuracy on the test dataset by perturbing the training dataset. At each stage, a new training data batch arrives and the SVM’s validation procedure yields a training set that includes only those batches whose distributions are sufficiently close to that of a validation dataset, which is not known to the adversary. The procedure uses a modified multivariate Cramér test. However, the SVM has access to the test dataset only in the very end, whereas the adversary knows it from the very beginning. SVM’s strategy is a sequence of thresholds used in the validation procedure, whereas adversary’s strategy is a sequence of upper bounds on perturbation norm. The adversary’s optimization problem at each stage is non-convex and two ways for solving it approximately are suggested. The proposed game is applied in malware detection. Among several considered SVM’s strategies, the one that increases the threshold when SVM performance on the validation set improves is the most efficient one.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.