NLP Technique for Malware Detection Using 1D CNN Fusion Model

Paul Ntim Yeboah,Haruna Balle Baz Musah,Mohammad Ayoub Khan

doi:10.1155/2022/2957203

Paul Ntim Yeboah, Haruna Balle Baz Musah + Show 1 more

Open Access

https://doi.org/10.1155/2022/2957203

Copy DOI

Abstract

With the record of the highest market share of mobile operating systems, the Android operating system has become a prime target for cyber perpetrators as malicious applications are leveraged as attack vectors to exploit Android systems. Machine learning detection solutions that have become a resort mostly rely on handcrafted features, a process deemed to be laborious and time-consuming. In this article, we employ a deep learning-based model consisting of 1-dimensional convolutional neural network (1D CNN) to automate the detection of Android malware. Our choice of 1D CNN was motivated by the computational advantage of 1D convolution operations over 2D CNN. The proposed model automatically extracts features from semantically embedded n-grams of raw static operation code (opcodes) sequences to determine the maliciousness of a binary file. Predictions of the 1D CNN model trained on multiple feature sets of n-gram opcode sequences are combined using a weighted average ensemble. Optimal prediction weights were obtained using a grid search on values in the range 0 to 1. With an Android dataset comprising 4951 malware and 2477 benign samples, our model yielded a positive predictive value of 98% and sensitivity of 97% using a weight parity of 0.5 for ensemble unigram and bigram opcode sequences.

Full Text