IMCNN:Intelligent Malware Classification using Deep Convolution Neural Networks as Transfer learning and ensemble learning in honeypot enabled organizational network

B. Janet,Sanjeev Kumar,Subramanian Neelakantan

doi:10.1016/j.comcom.2023.12.036

Abstract

Traditional malware detection systems based on signature-based detection methods cannot detect new and unseen malware. Moreover, conventional machine learning methods for malware detection have utilized features extracted through static program analysis or dynamic analysis, which requires code debugging and execution primarily through offline processing; hence not a scalable approach. This paper proposes a novel intelligent malware classification using a deep convolution neural network (IMCNN) in organizational networks enabled with Honeypots. Systematic customization of pre-trained convolutional neural networks(CNN) as a transfer learning and ensemble learning as a classification is presented to detect intelligent modern-day malware. Real-world malware samples are systematically labeled and visualized into grayscale images. Four cutting-edge deep CNN models - VGG16, VGG19, InceptionV3, and ResNet50, are trained on the ImageNet database (≥1 million) and fine-tuned as feature extractors along with a basic CNN model. Three strategies are designed for feature extraction and selection: Rectified linear unit (ReLU) fully connected layer embedded in a deep CNN model, principal component analysis (PCA), and singular value decomposition(SVD). Reduced sets of features are stacked and used to train k-nearest neighbor (k-NN), support vector machine (SVM), and random forest (RF) classifiers for predictions. Subsequently, the predictive probabilities of different machine-learned models are ensembled using a soft voting method for final classification. The proposed method is evaluated on MalImg datasets (9339 malware samples of 25 families) and real-world modern malware datasets (690 malware of 22 families). The experimental results reveal that despite using a reduced feature set, the IMCNN effectively detects malware with 99.36% test accuracy on unseen data for MalImg datasets and 92.11% for real-world malware. In addition, the proposed method is compared with several existing state-of-art malware detection models in terms of performance accuracy and found performing as the best. Experiments demonstrated that the proposed method is resilient to polymorphic code obfuscation used by the malware authors.

Full Text