Intelligent Vision-Based Malware Detection and Classification Using Deep Random Forest Paradigm

S Abijah Roseline,Yunyoung Nam,S Geetha,Seifedine Kadry

doi:10.1109/access.2020.3036491

S Abijah Roseline, Yunyoung Nam + Show 2 more

Open Access

https://doi.org/10.1109/access.2020.3036491

Copy DOI

Abstract

Malware is a rapidly increasing menace to modern computing. Malware authors continually incorporate various sophisticated features like code obfuscations to create malware variants and elude detection by existing malware detection systems. The classification of unseen malware variants with similar characteristics into their respective families is a significant challenge, even if the classifier is trained with known variants belonging to the same family. The identification and extraction of distinct features for each malware is another issue for generalizing the malware detection system. Features that contribute to the generalization capability of the classifier are difficult to be engineered with modifications in each malware. Conventional malware detection systems employ static signature-based methods and dynamic behavior-based methods, which are inefficient in analyzing and detecting advanced and zero-day malware. To address these issues, this work employs a visualization approach where malware is represented as 2D images and proposes a robust machine learning-based anti-malware solution. The proposed system is based on a layered ensemble approach that mimics the key characteristics of deep learning techniques but performs better than the latter. The proposed system does not require hyperparameter tuning or backpropagation and works with reduced model complexity. The proposed model outperformed other state-of-the-art techniques with a detection rate of 98.65%, 97.2%, and 97.43% for Malimg, BIG 2015, and MaleVis malware datasets, respectively. The results demonstrate that the proposed solution is effective in identifying new and advanced malware due to its diverse features.

Highlights

The internet has become a key aspect of our daily lives. making our lives convenient, the internet has made innocent users vulnerable to attacks
Experiments were conducted for malware classification using machine learning classifiers such as Logistic Regression (LR) [40], Naïve Bayes (NB) (Vinayakumar et al, 2019), Decision Tree (DT) [40], Random Forest (RF) [40, 15], K-Nearest Neighbor (KNN) [31] and Support Vector Machine (SVM) [40, 31]. [26] proposed an approach that extracts Local Binary Patterns (LBP), and dense Scale-Invariant Feature Transform (SIFT) features from malware image. [40] presented a scalable and hybrid deep learning model called ScaleMalNet based on malware image processing to detect new malware
The four benchmark malware datasets used to analyze the performance of the proposed deep forest-based malware detection system are the Malimg dataset [31] and the BIG 2015 dataset [2], MaleVis dataset [6] and Malware dataset [38]

Summary

INTRODUCTION

The internet has become a key aspect of our daily lives. making our lives convenient, the internet has made innocent users vulnerable to attacks. These texture patterns in the images are found to be exhibiting significant visual similarities to malware belonging to the same class Another advantage is that vision-based analysis does not need static disassembly or dynamic execution of binaries, unlike other traditional malware analysis techniques. With the flow of zero-day and unlabeled malware, the detection performance using deep learning is low These deep models are very intricate and require high computational overhead. Considering all these factors, this research work proposes to employ an ensemble deep forest algorithm, which has many advantages over the existing machine learning and deep learning models It has high generalization ability, improved detection accuracy & precision, and low computational overhead.

Related Work

Static Analysis

Dynamic Analysis

Vision-based Analysis

Hybrid Analysis

Overview

Preprocessing phase

Sliding Window Scanning Stage

Cascade Layering Stage

Mathematical proof of the proposed malware detection system

Generalization error of ensemble estimator

17: Train v with all elements of E

Diversity measure

Relationship between diversity and generalization

Datasets and Experiment Setup

Results and Discussion

Methods

Comparison Results

Resilience to Obfuscation

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 116	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Intelligent Vision-Based Malware Detection and Classification Using Deep Random Forest Paradigm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

FUMVar
Beomjin Jin ... Jin B Hong
-
Beomjin Jin, et. al.Beomjin Jin ... Jin B Hong
22 Mar 2021
22 Mar 2021

Adversarial Variational Modality Reconstruction and Regularization for Zero-Day Malware Variants Similarity Detection
Christopher Molloy ... H H Steven Ding
-
Christopher Molloy, et. al.Christopher Molloy ... H H Steven Ding
01 Nov 2022
01 Nov 2022

Machine Learning-Based Lightweight Android Malware Detection System with Static Features
Kavita Jain ... Mayank Dave
-
Kavita Jain, et. al.Kavita Jain ... Mayank Dave
26 Nov 2020
26 Nov 2020

Separating Malicious from Benign Software Using Deep Learning Algorithm
Ömer Aslan
Electronics | VOL. 12
Ömer AslanÖmer Aslan
14 Apr 2023
Electronics | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Intelligent Vision-Based Malware Detection and Classification Using Deep Random Forest Paradigm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access