Channel Features and API Frequency-Based Transformer Model for Malware Identification.

Liping Qian,Lin Cong

doi:10.3390/s24020580

Abstract

Malicious software (malware), in various forms and variants, continues to pose significant threats to user information security. Researchers have identified the effectiveness of utilizing API call sequences to identify malware. However, the evasion techniques employed by malware, such as obfuscation and complex API call sequences, challenge existing detection methods. This research addresses this issue by introducing CAFTrans, a novel transformer-based model for malware detection. We enhance the traditional transformer encoder with a one-dimensional channel attention module (1D-CAM) to improve the correlation between API call vector features, thereby enhancing feature embedding. A word frequency reinforcement module is also implemented to refine API features by preserving low-frequency API features. To capture subtle relationships between APIs and achieve more accurate identification of features for different types of malware, we leverage convolutional neural networks (CNNs) and long short-term memory (LSTM) networks. Experimental results demonstrate the effectiveness of CAFTrans, achieving state-of-the-art performance on the mal-api-2019 dataset with an F1 score of 0.65252 and an AUC of 0.8913. The findings suggest that CAFTrans improves accuracy in distinguishing between various types of malware and exhibits enhanced recognition capabilities for unknown samples and adversarial attacks.

Full Text