Abstract

Executable files coming from the internet bring along with them many potential hazards and vul- nerabilities in the form of malware to computer systems. The executables can be of form raw binaries, mnemonics, libraries, and function calls/APIs. They can misguide many of the conventional malware detection techniques. This paper explores the potential of Machine Learning- based methods for malware detection problems. The scope of the work here is currently limited to Static Anal- ysis of Executable files. Various feature selection tech- niques are implemented to reduce the size of the training data. Machine learning algorithms like K-Nearest Neigh- bors and Random Forest Classifier were trained on the curated feature sets. The outperforming experiment re- sult was shown by the Random Forest Classifier having an accuracy of 99.5%. We have developed a framework as a two-step module; in the first step, a list of features are extracted from a given executable file, and then for the next step, trained algorithm is integrated into the framework which will classify whether the given executa- ble file is malicious or not. This framework is demon- strated in the form of a Webapp developed in Python. Furthermore, this framework is evaluated based on its performance on a small dataset containing 35 portable executables (.exe) files and it is observed to be retaining the accuracy of the trained algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call