Malware detection by text and data mining

G Ganesh Sundarkumar,Vadlamani Ravi

doi:10.1109/iccic.2013.6724229

Abstract

Cyber frauds are a major security threat to the banking industry worldwide. Malware is one of the manifestations of cyber frauds. Malware authors use Application Programming Interface (API) calls to perpetrate these crimes. In this paper, we propose a static analysis method to detect Malware based on API call sequences using text and data mining in tandem. We analyzed the dataset available at CSMINING group. First, we employed text mining to extract features from the dataset consisting a series of API calls. Further, mutual information is invoked for feature selection. Then, we resorted to over-sampling to balance the data set. Finally, we employed various data mining techniques such as Decision Tree (DT), Multi Layer Perceptron (MLP), Support Vector Machine (SVM), Probabilistic Neural Network (PNN) and Group Method for Data Handling (GMDH). We also applied One Class SVM (OCSVM). Throughout the paper, we used 10-fold cross validation technique for testing the techniques. We observed that SVM and OCSVM achieved 100% sensitivity after balancing the dataset.

Full Text