A PE header-based method for malware detection using clustering and deep embedding techniques

Tina Rezaei,Farnoush Manavi,Ali Hamzeh

doi:10.1016/j.jisa.2021.102876

Abstract

Recent years have witnessed the dramatic growth of malware programs in a wide range of malicious intentions following the expansion of computer systems. Hence, highly effective systems to detect malware are extremely demanded. Most of the recent approaches use machine learning techniques along with the features extracted from files such as byte sequence, API-Calls, Op-Code sequence, and hardware events to detect malware. Utilizing the executable file header to extract features is a wide-spread way in this field since it contains efficient and prominent content to distinguish malware and benign programs. In this paper, a novel deep learning method is proposed to learn different embedding representations for malware and benign programs. To this end, the deep neural network uses a clustering algorithm in the training process. During the training process, samples are embedded through the neural network, and then the output of the neural network is fed into the k-means clustering algorithm, which is segmenting samples into two clusters of malware and benign. The network parameters are then updated based on the clustering result. By repeating this training process, the network representations and clustering assignments refine iteratively to the point that the network learns different representations for malware and benign programs. The proposed method utilizes raw bytes of the PE files header. Due to the lightweight network and utilizing the raw byte, which is fast to extract, the proposed method has a considerably low-computational overhead, and a set of experiments showed that this method is highly fast to use as a real-time malware detection method with high performance.

Full Text