Windows malware detection based on cuckoo sandbox generated report using machine learning algorithm

S.L Shiva Darshan,C.D Jaidhar,M.A Ajay Kumara

doi:10.1109/iciinfs.2016.8262998

Abstract

Malicious software or malware has grown rapidly and many anti-malware defensive solutions have failed to detect the unknown malware since most of them rely on signature-based technique. This technique can detect a malware based on a pre-defined signature, which achieves poor performance when attempting to classify unseen malware with the capability to evade detection using various code obfuscation techniques. This growing evasion capability of new and unknown malwares needs to be countered by analyzing the malware dynamically in a sandbox environment, since the sandbox provides an isolated environment for analyzing the behavior of the malware. In this paper, the malware is executed on to the cuckoo sandbox to obtain its run-time behavior. At the end of the execution, the cuckoo sandbox reports the system calls invoked by the malware during execution. However, this report is in JSON format and has to be converted to MIST format to extract the system calls. The collected system calls are structured in the form of N-Grams, which help to build the classifier by using the Information Gain (IG) as a feature selection technique. A comprehensive experiment was conducted to perceive the best fit classifier among the chosen classifiers, including the Bayesian-Logistic-Regression, SPegasos, IB1, Bagging, Part, and J48 defined within the WEKA tool. From the experimental results, the overall best performance for all the selected top N-Grams such as 200, 400, and 600 goes to SPegasos with the highest accuracy, highest True Positive Rate (TPR), and lowest False Positive Rate (FPR).

Full Text