Malicious file attacks seriously affect network and data security, and recognizing malicious files and variants is crucial for preventing network attacks. Faced with the challenge of traditional methods in quickly, effectively, and efficiently recognizing malicious files or variants, visualization-based feature representation methods have shown promising results. However, practical applications encounter issues such as loss of crucial information, high spatiotemporal overhead, and the need for model performance improvement. Therefore, this paper introduces a novel recognition framework focusing on feature representation and model performance. The framework uses the proposed visualization-based comprehensive feature representation method (VCFR) to extract file information into the Gray-Level Co-occurrence Matrix (GLCM), 2-gram frequency matrix, and interval 2-gram frequency matrix, followed by feature fusion to generate the three-channel RGB images. Subsequently, the proposed lightweight model is applied for recognizing those files, which utilizes ideas such as group convolution, channel shuffle, and attention mechanisms to improve model performance while significantly reducing model parameters, size, and FLOPs. In summary, through a series of experiments conducted on manually collected malicious file dataset (MFD) and public dataset MMCC, the proposed framework significantly outperformed other state-of-the-art technologies and has F1-Score as high as 94.10% and 98.58%, respectively, further verifying its outstanding effectiveness and efficiency.
Read full abstract