Imbalanced Malware Family Classification Using Multimodal Fusion and Weight Self-Learning

Shudong Li,Yuan Li,Sattam Al Otaibi,Zhihong Tian,Xiaobo Wu

doi:10.1109/tits.2022.3208891

Abstract

In recent years, the increasing prevalence of Intelligent Transportation Systems with advanced technologies has led to the emergence of many targeted forms of malware such as ransomware, Trojans, viruses, and malicious mining programs. And malware authors use policies like category disguise or family obfuscation in malware components to evade detection, which poses a great security threat to enterprises, government agencies, and Internet users. In this paper, we propose a malware family classification approach based on multimodal fusion and weight self-learning. Firstly, multiple modalities of malware such as byte, format, statistic, and semantic are fused in various ways to generate effective features. And then, we creatively add a weight self-learning mechanism of malware families into the classification model, which works by continuously calculating log-loss based on the family label and the probabilities predicted by each feature. The approach proves to achieve excellent classification performance on highly imbalanced malware family datasets with high efficiency and small resource overhead, which helps to identify and classify malware families and enhance the efficiency of massive malware analysis in Intelligent Transportation Systems.

Full Text