Multi-label Classification for Android Malware Based on Active Learning

Qijing Qiao,Xiaohong Li,Fei Zhang,Ruitao Feng,Sen Chen

doi:10.1109/tdsc.2022.3213689

Abstract

The existing malware classification approaches (i.e., binary and family classification) can barely benefit subsequent analysis with their outputs. Even the family classification approaches suffer from lacking a formal naming standard and an incomplete definition of malicious behaviors. More importantly, the existing approaches are powerless for one malware with multiple malicious behaviors, while this is a very common phenomenon for Android malware in the wild. So that both of them actually cannot provide researchers with a direct and comprehensive enough understanding of malware. In this paper, we propose MLCDroid, an ML-based multi-label classification approach that can directly indicate the existence of pre-defined malicious behaviors. With an in-depth analysis, we summarize 6 basic malicious behaviors from real-world malware with security reports and construct a labeled dataset. We compare the results of 70 algorithm combinations to evaluate the effectiveness (best at 73.3%). Faced with the challenge of the expensive cost of data annotation, we further propose an active learning approach based on data augmentation, which can improve the overall accuracy to 86.7% with a data augmentation of 5,000+ high-quality samples from an unlabeled malware dataset. This is the first multi-label Android malware classification approach intending to provide more information on fine-grained malicious behaviors.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-label Classification for Android Malware Based on Active Learning

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Dependable and Secure Computing

Lead the way for us

Journal: IEEE Transactions on Dependable and Secure Computing	Publication Date: Jan 1, 2024
Citations: 5

Similar Papers

EnML: Multi-label Ensemble Learning for Urdu Text Classification
Faiza Mehmood ... Muhammad Nabeel Asim
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22
Faiza Mehmood, et. al.Faiza Mehmood ... Muhammad Nabeel Asim
22 Sep 2023
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 22

Multi-label sub-pixel classification of red and black soil over sparse vegetative areas using AVIRIS-NG airborne hyperspectral image
Anand S Sahadevan ... Touseef Ahmad
Remote Sensing Applications: Society and Environment | VOL. 29
Anand S Sahadevan, et. al.Anand S Sahadevan ... Touseef Ahmad
17 Nov 2022
Remote Sensing Applications: Society and Environment | VOL. 29

InstructNet: A novel approach for multi-label instruction classification through advanced deep learning.
Tanjim Taharat Aurpa ... Md Golam Moazzam
PloS one | VOL. 19
Tanjim Taharat Aurpa, et. al.Tanjim Taharat Aurpa ... Md Golam Moazzam
01 Jan 2024
PloS one | VOL. 19

Multi-label EMG Classification of Isotonic Hand Movements: A Suitable Method for Robotic Prosthesis Control
José Jair Alves Mendes Junior ... Daniel Prado Campos
-
José Jair Alves Mendes Junior, et. al.José Jair Alves Mendes Junior ... Daniel Prado Campos
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-label Classification for Android Malware Based on Active Learning

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Dependable and Secure Computing