In today’s cybersecurity landscape, software security companies encounter a significant challenge in detecting new and unknown malware. Despite the introduction of various machine learning and deep learning tools designed to identify malicious software based on static and dynamic features, achieving the desired level of accuracy remains elusive. This challenge is exacerbated by factors such as encryption, packing, limited distribution, and uneven allocation of malware samples across different families. Moreover, deep learning techniques demand substantial time, computational resources (specifically GPUs), and expertise from data scientists for practical malware analysis. In response to these challenges, we propose a novel GPU-free approach called Image-based Malware Classification using Broad Learning (IMCBL) to address these issues. Our method integrates visualization, feature decomposition, and broad learning architecture to enhance malware detection and classification. We convert raw malware binaries into images, reducing the necessity for extensive feature engineering. These images transform using truncated Singular Value Decomposition (SVD) to reduce the feature vector size, expediting the training process while mitigating model overfitting. The transformed feature vector is then input into our proposed Broad Learning (BL) system, which facilitates malware detection and classification. The BL architecture, structured as a flat network mapping original inputs to feature nodes and expanding the structure in enhancement nodes, ensures efficient and effective classification without the need for retraining. This dynamic and incremental learning capability sets IMCBL apart, making it superior to existing deep learning architectures. To validate our approach, we conducted extensive experiments using five benchmark malware datasets, including the Microsoft Windows malware challenge dataset, the Malimg Windows malware dataset, the IoT-Android mobile malware dataset, the Big Windows malware dataset, and an obfuscated Windows malware dataset. The results demonstrate IMCBL’s remarkable success in classifying most malware samples, even under obfuscation attacks, performing comparably or outperforming current methods using similar benchmarks. Specifically, IMCBL achieved 95.58% accuracy for the Microsoft Windows malware dataset, 97.64% accuracy for the Malimg Windows malware dataset, 96.51% accuracy for IoT Android malware datasets, and 96.19% accuracy for the extensive Windows malware dataset. Additionally, IMCBL demonstrated 93.04% accuracy for an obfuscated Windows malware dataset, which contains both packed and unpacked malware samples. Notably, IMCBL exhibits an exponential advantage in computation overhead, including training and prediction time, when compared to traditional machine learning and advanced state-of-the-art deep learning architectures such as VGG16, ResNet50, and InceptionV3.
Read full abstract