A3CM: Automatic Capability Annotation for Android Malware

Junyang Qiu,Yu Wang,Lei Pan,Surya Nepal,Wei Luo,Jun Zhang,Yang Xiang

doi:10.1109/access.2019.2946392

Abstract

Android malware poses serious security and privacy threats to the mobile users. Traditional malware detection and family classification technologies are becoming less effective due to the rapid evolution of the malware landscape, with the emerging of so-called zero-day-family malware families. To address this issue, our paper presents a novel research problem on automatically identifying the security/privacy-related capabilities of any detected malware, which we refer to as Malware Capability Annotation (MCA). Motivated by the observation that known and zero-day-family malware families share the security/privacy-related capabilities, MCA opens a new alternative way to effectively analyze zero-day-family malware (the malware that do not belong to any existing families) through exploring the related information and knowledge from known malware families. To address the MCA problem, we design a new MCA hunger solution, Automatic Capability Annotation for Android Malware (A3CM). A3CM works in the following four steps: 1) A3CM automatically extracts a set of semantic features such as permissions, API calls, network addresses from raw binary APKs to characterize malware samples; 2) A3CM applies a statistical embedding method to map the features into a joint feature space, so that malware samples can be represented as numerical vectors; 3) A3CM infers the malicious capabilities by using the multi-label classification model; 4) The trained multi-label model is used to annotate the malicious capabilities of the candidate malware samples. To facilitate the new research of MCA, we create a new ground truth dataset that consists of 6,899 annotated Android malware samples from 72 families. We carry out a large number of experiments based on the four representative security/privacy-related capabilities to evaluate the effectiveness of A3CM. Our results show that A3CM can achieve promising accuracy of 1.00, 0.98 and 0.63 in inferring multiple capabilities of known Android malware, small size-families’ malware and zero-day-families’ Android malware, respectively.

Highlights

Android has become the most popular mobile operating system, with 74.82% market share in February 2018 [1]
To address the limitations faced by malware family classification, we propose a new research problem called Android Malware Capability Annotation (MCA)
To perform the classification task of A3CM, we employ Decision Tree (DT) and Support Vector Machine (SVM) classifiers implemented in scikit-learn [52], [56]

Summary

INTRODUCTION

Android has become the most popular mobile operating system, with 74.82% market share in February 2018 [1]. To preserve a healthy ecosystem for Android users, the research communities and security vendors have proposed various techniques to analyze malware [5]–[10]. To address the limitations faced by malware family classification, we propose a new research problem called Android Malware Capability Annotation (MCA). To solve the MCA research problem, we design a solution employing the multi-label classification model to annotate the capabilities of Android malware. We present a novel research problem on automatically identifying the security/privacy-related capabilities of any detected malware, which we refer to as Malware Capability Annotation (MCA). To facilitate the MCA research problem, based on the existing open sourced Android malware datasets and ground truth, we firstly create a well security/privacyrelated capability annotated dataset.

RELATED WORK

THE PROPOSED ACAAM TECHNIQUE

ANNOTATION PERFORMANCE ON ZERO-DAY-FAMILY MALWARE SAMPLES

LIMITATIONS

Findings

VIII. CONCLUSION AND FUTURE WORK