Abstract
Diverse malware programs are set up daily focusing on attacking computer systems without the knowledge of their users. While some authors of these programs intend to steal secret information, others try quietly to prove their competence and aptitude. The traditional signature-based static technique is primarily used by anti-malware programs in order to counter these malicious codes. Although this technique excels at blocking known malware, it can never intercept new ones. The dynamic technique, which is often based on running the executable on a virtual environment, may be introduced by a number of anti-malware programs. The major drawbacks of this technique are the long period of scanning and the high consumption of resources. Nowadays, recent programs may utilize a third technique. It is the heuristic technique based on machine learning, which has proven its success in several areas based on the processing of huge amounts of data. In this paper we provide a survey of available researches utilizing this latter technique to counter cyber-attacks. We explore the different training phases of machine learning classifiers for malware detection. The first phase is the extraction of features from the input files according to previously chosen feature types. The second phase is the rejection of less important features and the selection of the most important ones which better represent the data contained in the input files. The last phase is the injection of the selected features in a chosen machine learning classifier, so that it can learn to distinguish between benign and malicious files, and give accurate predictions when confronted to previously unseen files. The paper ends with a critical comparison between the studied approaches according to their performance in malware detection.
Highlights
Every day, the AV-TEST1 institute registers over 250000 new malware
We focus on researches built for the detection of malware designed for Windows operating systems
The transformation of the input dataset into another exploitable space brings a great gain in both data processing time and performance measures
Summary
The traditional signature-based static technique is primarily used by anti-malware programs in order to counter these malicious codes. This technique excels at blocking known malware, it can never intercept new ones. Recent programs may utilize a third technique It is the heuristic technique based on machine learning, which has proven its success in several areas based on the processing of huge amounts of data. We explore the different training phases of machine learning classifiers for malware detection. The last phase is the injection of the selected features in a chosen machine learning classifier, so that it can learn to distinguish between benign and malicious files, and give accurate predictions when confronted to previously unseen files.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have