Abstract

Every day, hundreds of thousands of new malware programs are developed and spread worldwide in cyberspace. Most of these malware programs are malware variants such as polymorphic and metamorphic malware, which are created from older versions of malware and able to change their structures and function flows to circumvent security solutions. The accuracy of malware variant detection is a crucial challenge. Many existing malware variant detections use static features extracted from the physical structure of malware file, such as opcodes and function flows. Unfortunately, the static features are subject to obfuscation and code shelling using simple obfuscation techniques. Although a malware variant can change its structure and function flows, it is widely believed that the malware variant cannot hide its malicious behavioral patterns during the runtime. Accordingly, dynamic, or behavioral analysis-based features were suggested by many studies to detect malware variants accurately. However, most of these studies are solely dependent on application-programmable interface calls (or API calls), which is not enough to accurately distinguish between malware and benign due to API-based obfuscation techniques. Therefore, a malware variant detection model that combines different behavioral activities can improve detection accuracy while reducing the false-negative rate. To this end, this study proposed a Deep-Ensemble and Multifaceted Behavioral Malware Variant Detection Model using Sequential Deep Learning and Extreme Gradient Boosting Techniques. Different behavioral features were extracted from the dynamic analysis environment. Then, a feature extraction algorithm that can automatically extract effective representative patterns has been designed and developed to extract the hidden representative features of the malware variants using a sequential deep learning model. These features have been fed into a developed extreme gradient boosting-based classifier for decision making. Extensive experiments have been carried out to validate the proposed scheme. The results were compared to the other related techniques in the field. The results show that the proposed model is reliable, as it improves the detection rate while reducing the false-negative rate.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call