TTDAT: Two-Step Training Dual Attention Transformer for Malware Classification Based on API Call Sequences

Peng Wang,Jiacheng Zhu,Tongcan Lin,Junfeng Wang,Di Wu

doi:10.3390/app14010092

Abstract

The surge in malware threats propelled by the rapid evolution of the internet and smart device technology necessitates effective automatic malware classification for robust system security. While existing research has primarily relied on some feature extraction techniques, issues such as information loss and computational overhead persist, especially in instruction-level tracking. To address these issues, this paper focuses on the nuanced analysis of API (Application Programming Interface) call sequences between the malware and system and introduces TTDAT (Two-step Training Dual Attention Transformer) for malware classification. TTDAT utilizes Transformer architecture with original multi-head attention and an integrated local attention module, streamlining the encoding of API sequences and extracting both global and local patterns. To expedite detection, we introduce a two-step training strategy: ensemble Transformer models to generate class representation vectors, thereby bolstering efficiency and adaptability. Our extensive experiments demonstrate TTDAT’s effectiveness, showcasing state-of-the-art results with an average F1 score of 0.90 and an accuracy of 0.96.

Full Text