Malware Detection Using Ensemble N-gram Opcode Sequences

Paul Ntim Yeboah,Stephen Kweku Amuquandoh,Haruna Balle Baz Musah,Stephen Kweku Amuquandoh

doi:10.3991/ijim.v15i24.25401

Abstract

Conventional approaches to tackling malware attacks have proven to be futile at detecting never-before-seen (zero-day) malware. Research however has shown that zero-day malicious files are mostly semantic-preserving variants of already existing malware, which are generated via obfuscation methods. In this paper we propose and evaluate a machine learning based malware detection model using ensemble approach. We employ a strategy of ensemble where multiple feature sets generated from different n-gram sizes of opcode sequences are trained using a single classifier. Model predictions on the trained multi feature sets are weighted and combined on average to make a final verdict on whether a binary file is malicious or benign. To obtain optimal weight combination for the ensemble feature sets, we applied a grid search on a set of pre-defined weights in the range 0 to 1. With a balanced dataset of 2000 samples, an ensemble of n-gram opcode sequences of n sizes 1 and 2 with respective weight pair 0.3 and 0.7 yielded the best detection accuracy of 98.1% using random forest (RF) classifier. Ensemble n-gram sizes 2 and 3 obtained 99.7% as best precision using weight 0.5 for both models.

Highlights

The surge in malware attacks has become a major threat to internet security
support vector machine (SVM) trained on rbf kernel yielded 97% as the best accuracy for models trained with SVM using ensemble n-gram sizes 1 and 3 with weight pair 0.6 and 0.4 respectively, and the best precision score of 96.7% using n-grams 1 and 2 with respective weights 0.6 and 0.4
Ensemble models trained with k-nearest neighbour (KNN) with k neighbors=5 recorded best accuracy of 98% and precision of 98.4% using n-gram sizes 1 and 2 with respective weights 0.4 and 0.6

Summary

Introduction

The surge in malware attacks has become a major threat to internet security. Proliferation in malware attacks could be attributed to the high profit incentives derived from these illicit breaches [1, 2]. A cyber threat report by SonicWall [3] shows that out of the millions of detection engines deployed worldwide, a total of 9.9 billion malware attacks were recorded in 2019 with over 440,000 malware variants. In 2020 SonicWall reported a total of 5.6 billion malware attacks, which is obviously a decline from the previous year. This emerging threat calls for a more sophisticated solution. The signature based method has been the conventional approach for malware detection. With this approach, malware footprint including byte sequences, hashes or anomalies are precomputed and used as a repository for future queries for suspicious files

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Interactive Mobile Technologies (iJIM)	Publication Date: Dec 21, 2021
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Malware Detection Using Ensemble N-gram Opcode Sequences

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Interactive Mobile Technologies (iJIM)

Lead the way for us

Similar Papers

Malware Detection Based on Opcode Sequence and ResNet
Xuetao Zhang ... Meng Sun
-
Xuetao Zhang, et. al.Xuetao Zhang ... Meng Sun
17 Apr 2019
17 Apr 2019

Prediction of Human Actions in a Smart Home Using Single and Ensemble of Classifiers
Basman M Hasan Alhafidh ... William H Allen
-
Basman M Hasan Alhafidh, et. al.Basman M Hasan Alhafidh ... William H Allen
01 Apr 2018
01 Apr 2018

Prediction of Human Actions in a Smart Home Using Single and Ensemble of Classifiers
Basman M Hasan Alhafidh ... Amar I Daood
-
Basman M Hasan Alhafidh, et. al.Basman M Hasan Alhafidh ... Amar I Daood
01 Apr 2018
01 Apr 2018

The Prediction of Mandibular Osteoradionecrosis (ORN) in Head and Neck Radiotherapy Using CT-Derived Radiomic Features
R Reiazi ... B Haibe-Kains
International Journal of Radiation Oncology, Biology, Physics | VOL. 111
R Reiazi, et. al.R Reiazi ... B Haibe-Kains
22 Oct 2021
International Journal of Radiation Oncology, Biology, Physics | VOL. 111

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Malware Detection Using Ensemble N-gram Opcode Sequences

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Interactive Mobile Technologies (iJIM)