Abstract

Malware is a rapidly increasing menace to modern computing. Malware authors continually incorporate various sophisticated features like code obfuscations to create malware variants and elude detection by existing malware detection systems. The classification of unseen malware variants with similar characteristics into their respective families is a significant challenge, even if the classifier is trained with known variants belonging to the same family. The identification and extraction of distinct features for each malware is another issue for generalizing the malware detection system. Features that contribute to the generalization capability of the classifier are difficult to be engineered with modifications in each malware. Conventional malware detection systems employ static signature-based methods and dynamic behavior-based methods, which are inefficient in analyzing and detecting advanced and zero-day malware. To address these issues, this work employs a visualization approach where malware is represented as 2D images and proposes a robust machine learning-based anti-malware solution. The proposed system is based on a layered ensemble approach that mimics the key characteristics of deep learning techniques but performs better than the latter. The proposed system does not require hyperparameter tuning or backpropagation and works with reduced model complexity. The proposed model outperformed other state-of-the-art techniques with a detection rate of 98.65%, 97.2%, and 97.43% for Malimg, BIG 2015, and MaleVis malware datasets, respectively. The results demonstrate that the proposed solution is effective in identifying new and advanced malware due to its diverse features.

Highlights

  • The internet has become a key aspect of our daily lives. making our lives convenient, the internet has made innocent users vulnerable to attacks

  • Experiments were conducted for malware classification using machine learning classifiers such as Logistic Regression (LR) [40], Naïve Bayes (NB) (Vinayakumar et al, 2019), Decision Tree (DT) [40], Random Forest (RF) [40, 15], K-Nearest Neighbor (KNN) [31] and Support Vector Machine (SVM) [40, 31]. [26] proposed an approach that extracts Local Binary Patterns (LBP), and dense Scale-Invariant Feature Transform (SIFT) features from malware image. [40] presented a scalable and hybrid deep learning model called ScaleMalNet based on malware image processing to detect new malware

  • The four benchmark malware datasets used to analyze the performance of the proposed deep forest-based malware detection system are the Malimg dataset [31] and the BIG 2015 dataset [2], MaleVis dataset [6] and Malware dataset [38]

Read more

Summary

INTRODUCTION

The internet has become a key aspect of our daily lives. making our lives convenient, the internet has made innocent users vulnerable to attacks. These texture patterns in the images are found to be exhibiting significant visual similarities to malware belonging to the same class Another advantage is that vision-based analysis does not need static disassembly or dynamic execution of binaries, unlike other traditional malware analysis techniques. With the flow of zero-day and unlabeled malware, the detection performance using deep learning is low These deep models are very intricate and require high computational overhead. Considering all these factors, this research work proposes to employ an ensemble deep forest algorithm, which has many advantages over the existing machine learning and deep learning models It has high generalization ability, improved detection accuracy & precision, and low computational overhead.

Related Work
Static Analysis
Dynamic Analysis
Vision-based Analysis
Hybrid Analysis
Overview
Preprocessing phase
Sliding Window Scanning Stage
Cascade Layering Stage
Mathematical proof of the proposed malware detection system
Generalization error of ensemble estimator
17: Train v with all elements of E
Diversity measure
Relationship between diversity and generalization
Datasets and Experiment Setup
Results and Discussion
Methods
Comparison Results
Resilience to Obfuscation
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call