Abstract

Diverse malware programs are set up daily focusing on attacking computer systems without the knowledge of their users. While some authors of these programs intend to steal secret information, others try quietly to prove their competence and aptitude. The traditional signature-based static technique is primarily used by anti-malware programs in order to counter these malicious codes. Although this technique excels at blocking known malware, it can never intercept new ones. The dynamic technique, which is often based on running the executable on a virtual environment, may be introduced by a number of anti-malware programs. The major drawbacks of this technique are the long period of scanning and the high consumption of resources. Nowadays, recent programs may utilize a third technique. It is the heuristic technique based on machine learning, which has proven its success in several areas based on the processing of huge amounts of data. In this paper we provide a survey of available researches utilizing this latter technique to counter cyber-attacks. We explore the different training phases of machine learning classifiers for malware detection. The first phase is the extraction of features from the input files according to previously chosen feature types. The second phase is the rejection of less important features and the selection of the most important ones which better represent the data contained in the input files. The last phase is the injection of the selected features in a chosen machine learning classifier, so that it can learn to distinguish between benign and malicious files, and give accurate predictions when confronted to previously unseen files. The paper ends with a critical comparison between the studied approaches according to their performance in malware detection.

Highlights

  • Every day, the AV-TEST1 institute registers over 250000 new malware

  • We focus on researches built for the detection of malware designed for Windows operating systems

  • The transformation of the input dataset into another exploitable space brings a great gain in both data processing time and performance measures

Read more

Summary

A Survey of Malware Detection Techniques based on Machine Learning

The traditional signature-based static technique is primarily used by anti-malware programs in order to counter these malicious codes. This technique excels at blocking known malware, it can never intercept new ones. Recent programs may utilize a third technique It is the heuristic technique based on machine learning, which has proven its success in several areas based on the processing of huge amounts of data. We explore the different training phases of machine learning classifiers for malware detection. The last phase is the injection of the selected features in a chosen machine learning classifier, so that it can learn to distinguish between benign and malicious files, and give accurate predictions when confronted to previously unseen files.

INTRODUCTION
FEATURE EXTRACTION
Signatures Extraction
DLL Function Calls Extraction
Binary Sequences Extraction
PE File Header Fields Extraction
Entropy Signals Extraction
FEATURE SELECTION TECHNIQUES
Information Gain
Random Forest
Calculation of Accuracy by Considering Each Attribute Separately
Wavelet Transform
MACHINE LEARNING CLASSIFIERS
Neural Network
CLASSIFICATION OF THE STUDIED RESEARCHES
Findings
CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call