Android Malware Family Labeling: Perspectives from the Industry

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Labeling and classifying Android malware is important for identifying new threats, triaging security incidents, and demystifying evasion techniques. To automate the malware classification pipeline, state-of-the-art tools such as AVClass and Euphony unify raw labels from commercial antivirus vendors (i.e., VirusTotal) to produce family labels. These tools are widely used for automatic malware classification in both academic research and industry practice. However, they face significant limitations in real-world industrial scenarios with numerous and dynamically changing samples. For example, our industrial practices revealed that VirusTotal's results change over time, leading to temporal inconsistencies in family labeling results that rely on label unification, which can severely impact a company's security posture. Despite this, such issues and challenges remain understudied. In this paper, we present the first systematic measurement study of existing automatic Android malware family labeling systems from various aspects, including label dynamics, consistency, reliability, and etc. Based on a large-scale dataset, we validate that the labeling results of these systems do evolve with time, and such evolution can introduce bias into many previous studies on performance assessments. We also reveal substantial divergence in labeling decisions across different systems when given the same input. Besides, we identify a disclosure priority among families in these systems' labeling processes, which could threaten the industry by allowing malicious actors to exploit these discrepancies. Our findings could benefit both researchers and industry practitioners for further refinement of automatic malware family labeling systems, contributing to their practical applications.

Similar Papers
  • Research Article
  • Cite Count Icon 20
  • 10.1109/tc.2022.3143439
Lightweight, Effective Detection and Characterization of Mobile Malware Families
  • Nov 1, 2022
  • IEEE Transactions on Computers
  • Karim O Elish + 2 more

Android malware is an ongoing threat to billions of smart devices’ security, ranging from mobile phones to car infotainment systems. Despite numerous approaches and previous studies to develop solutions for detecting and preventing Android malware, the rapid continuous development of new malware variants requires a careful reconsideration and the development of effective methods to identify malware families given a meager number of malware instances. In this paper, we present DroidMalVet, a novel Android malware family classification and detection approach that does not require to perform complex program analyses or utilize large feature sets. DroidMalVet is the first to use a promising, diverse, and small set of software metrics as features in a supervised learning platform to classify and detect various Android malware families. Our extensive empirical evaluations on two large public malware datasets show that DroidMalVet accurately detects both small and large malware families with F-Score accuracy of 94.4% and 96%, and AUC equal to 99.5% and 99.7% on the malware families in Drebin and AMD datasets, respectively. Moreover, our results demonstrate the superior performance of DroidMalVet in detecting small families (i.e., families with few samples). DroidMalVet complements existing approaches and presents an early warning tool for detecting known and emerging malware families.

  • Research Article
  • Cite Count Icon 120
  • 10.1109/tdsc.2017.2739145
EC2: Ensemble Clustering and Classification for Predicting Android Malware Families
  • Oct 23, 2019
  • IEEE Transactions on Dependable and Secure Computing
  • Tanmoy Chakraborty + 2 more

As the most widely used mobile platform, Android is also the biggest target for mobile malware. Given the increasing number of Android malware variants, detecting malware families is crucial so that security analysts can identify situations where signatures of a known malware family can be adapted as opposed to manually inspecting behavior of all samples. We present EC2 ( E nsemble C lustering and C lassification), a novel algorithm for discovering Android malware families of varying sizes —ranging from very large to very small families (even if previously unseen). We present a performance comparison of several traditional classification and clustering algorithms for Android malware family identification on DREBIN, the largest public Android malware dataset with labeled families. We use the output of both supervised classifiers and unsupervised clustering to design EC2 . Experimental results on both the DREBIN and the more recent Koodous malware datasets show that EC2 accurately detects both small and large families, outperforming several comparative baselines. Furthermore, we show how to automatically characterize and explain unique behaviors of specific malware families, such as FakeInstaller , MobileTx , Geinimi . In short, EC2 presents an early warning system for emerging new malware families, as well as a robust predictor of the family (when it is not new) to which a new malware sample belongs, and the design of novel strategies for data-driven understanding of malware behaviors.

  • Conference Article
  • Cite Count Icon 22
  • 10.1109/bigdata47090.2019.9005669
Identifying Android Malware Families Using Android-Oriented Metrics
  • Dec 1, 2019
  • William Blanc + 3 more

Android malware (malicious apps) families share common attributes and behavior through sharing core malicious code. However, as the number of new malware increases, the task of identifying the correct family becomes more challenging. Two prominent approaches tackle this problem, either using dynamic analysis that captures the runtime behavior of the malware or using static analysis methods that can reveal malicious behavior by analyzing the underlying logic and code patterns. A third emerging way is to use the various sources of identification features to analyze the architectural and external attributes of a malicious app. For example, two malicious apps can have different behavioral patterns but share common attributes. We hypothesize that this malware can belong to the same family but attempt to mislead dynamic and code-level static analysis tools by randomizing their behavior. In this work, we utilize a promising set of Android-oriented code metrics that guide a supervised classification learning process for identifying malware families in Android. Our empirical results on 2,869 malware apps, across 35 different malware families, show that these metrics are very effective to identify malware families. In particular, we achieve low false positive rate (1.2%) and AUC score of 0.984 for family identification by using Random Forest (RF) classifier.

  • Book Chapter
  • Cite Count Icon 25
  • 10.1007/978-3-319-24018-3_12
How Current Android Malware Seeks to Evade Automated Code Analysis
  • Jan 1, 2015
  • Siegfried Rasthofer + 3 more

First we report on a new threat campaign, underway in Korea, which infected around 20,000 Android users within two months. The campaign attacked mobile users with malicious applications spread via different channels, such as email attachments or SMS spam. A detailed investigation of the Android malware resulted in the identification of a new Android malware family Android/BadAccents. The family represents current state-of-the-art in mobile malware development for banking trojans. Second, we describe in detail the techniques this malware family uses and confront them with current state-of-the-art static and dynamic code-analysis techniques for Android applications. We highlight various challenges for automatic malware analysis frameworks that significantly hinder the fully automatic detection of malicious components in current Android malware. Furthermore, the malware exploits a previously unknown tapjacking vulnerability in the Android operating system, which we describe. As a result of this work, the vulnerability, affecting all Android versions, will be patched in one of the next releases of the Android Open Source Project.

  • Research Article
  • Cite Count Icon 44
  • 10.17485/ijst/2016/v9i21/90273
System Call Analysis of Android Malware Families
  • Jun 20, 2016
  • Indian Journal of Science and Technology
  • Sapna Malik + 1 more

Background/Objectives: Now a days, Android Malware is coded so wisely that it has become very difficult to detect them. The static analysis of malicious code is not enough for detection of malware as this malware hides its method call in encrypted form or it can install the method at runtime. The system call tracing is an effective dynamic analysis technique for detecting malware as it can analyze the malware at the run time. Moreover, this technique does not require the application code for malware detection. Thus, this can detect that android malware also which are difficult to detect with static analysis of code. As Android was launched in 2008, so there were fewer studies available regarding the behavior of Android Malware Families and their characteristics. The aim of this work is to explore the behavior of 10 popular Android Malware Families focused on System Call Pattern of these families. Methods/Statistical Analysis: For this purpose, the authors have extracted the system call trace of 345 malicious applications from 10 Android Malware Families named FakeInstaller, Opfake, Plankton, DroidKungFu, BaseBridge, Iconosys, Kmin, Adrd and Gappusin using strace android tool and compared it with the system calls pattern of 300 Benign Applications to justify the behavior of malicious application. Findings: During the experiment, it is observed that the malicious applications invoke some system calls more frequently than benign applications. Different Android malware invokes the different set of system calls with different frequency. Applications/Improvements: This analysis can prove helpful in designing intrusion-detection systems for an android mobile device with more accuracy. Keywords: Android Kernal, Android Malware Installation Methods, Malware Families, System Call Analysis

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 59
  • 10.3390/electronics9060942
Android Malware Family Classification and Analysis: Current Status and Future Directions
  • Jun 5, 2020
  • Electronics
  • Fahad Alswaina + 1 more

Android receives major attention from security practitioners and researchers due to the influx number of malicious applications. For the past twelve years, Android malicious applications have been grouped into families. In the research community, detecting new malware families is a challenge. As we investigate, most of the literature reviews focus on surveying malware detection. Characterizing the malware families can improve the detection process and understand the malware patterns. For this reason, we conduct a comprehensive survey on the state-of-the-art Android malware familial detection, identification, and categorization techniques. We categorize the literature based on three dimensions: type of analysis, features, and methodologies and techniques. Furthermore, we report the datasets that are commonly used. Finally, we highlight the limitations that we identify in the literature, challenges, and future research directions regarding the Android malware family.

  • Conference Article
  • Cite Count Icon 23
  • 10.1109/dsc.2017.77
Analysis of Android Malware Family Characteristic Based on Isomorphism of Sensitive API Call Graph
  • Jun 1, 2017
  • Hao Zhou + 3 more

The analysis of multiple Android malware families indicates malware instances within a common malware family always have similar call graph structures. Based on the isomorphism of sensitive API call graph, we propose a method which is used to construct malware family features via combining static analysis approach with graph similarity metric. The experiment is performed on a malware dataset which contains 1326 malware samples from 16 different malware families. The result shows that the method can differentiate distinct malware family features and divide suspect malware samples into corresponding families with a high accuracy of 96.77% overall and even defend a certain extent of obfuscation.

  • Conference Article
  • 10.1109/prdc.2018.00047
ANTSdroid: Using RasMMA Algorithm to Generate Malware Behavior Characteristics of Android Malware Family
  • Dec 1, 2018
  • Shun-Chieh Chang + 5 more

Malware developers often use various obfuscation techniques to generate polymorphic and metamorphic versions of malicious programs. As a result, variants of a malware family generally exhibit resembling behavior, and most importantly, they possess certain common essential codes so to achieve the same designed purpose. Meantime, keeping up with new variants and generating signatures for each individual in a timely fashion has been costly and inefficient for anti-virus software companies. It motivates us the idea of no more dancing with variants. In this paper, we aim to find a malware family's main characteristic operations or activities directly related to its intent. We propose a novel automatic dynamic Android profiling system and malware family runtime behavior signature generation method called Runtime API sequence Motif Mining Algorithm (RasMMA) based on the analysis of the sensitive and permission-related execution traces of the threads and processes of a set of variant APKs of a malware family. We show the effectiveness of using the generated family signature to detect new variants using real-world dataset. Moreover, current anti-malware tools usually treat detection models as a black box for classification and offer little explanations on how malwares behave and how they proceed step by step to infiltrate targeted system and achieve the goal. We take malware family DroidKungFu as a case study to illustrate that the generated family signature indeed captures key malicious activities of the family.

  • Research Article
  • Cite Count Icon 109
  • 10.1016/j.cose.2021.102399
KronoDroid: Time-based Hybrid-featured Dataset for Effective Android Malware Detection and Characterization
  • Jul 9, 2021
  • Computers & Security
  • Alejandro Guerra-Manzanares + 2 more

KronoDroid: Time-based Hybrid-featured Dataset for Effective Android Malware Detection and Characterization

  • Conference Article
  • Cite Count Icon 26
  • 10.1109/iscc47284.2019.8969656
Android Malware Family Classification Based on Sensitive Opcode Sequence
  • Jun 1, 2019
  • Jianguo Jiang + 7 more

Android malware family classification is an advanced task in Android malware analysis, detection and forensics. Existing methods and models have achieved a certain success for Android malware detection, but the accuracy and the efficiency are still not up to the expectation, especially in the context of multiple class classification with imbalanced training data. To address those challenges, we propose an Android malware family classification model by analyzing the code's specific semantic information based on sensitive opcode sequence. In this work, we construct a sensitive semantic feature-sensitive opcode sequence using opcodes, sensitive APIs, STRs and actions, and propose to analyze the code's specific semantic information, generate a semantic related vector for Android malware family classification based on this feature. Besides, aiming at the families with minority, we adopt an oversampling technique based on the sensitive opcode sequence. Finally, we evaluate our method on Drebin dataset, and select the top 40 malware families for experiments. The experimental results show that the Total Accuracy and Average AUC (Area Under Curve, AUC) reach 99.50% and 98.86% with 45. 17s per Android malware, and even if the number of malware families increases, these results remain good.

  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-3-030-80216-5_12
PEDAM: Priority Execution Based Approach for Detecting Android Malware
  • Jan 1, 2021
  • Olorunjube James Falana + 3 more

With the openness and growing popularity of Android Operating system all over the world, it has become a target of attack for Malware authors who are determined to take advantage of over 2.5 billion monthly active users of Android devices. Despite Google’s various protection measures, android malware continues to grow in complexity and scope. In recent time, many research efforts have focused on detecting malware on the Android operating system using both static and dynamic approaches. Most of the existing techniques are still not perfect because of the problems of false positive, false negative and high detection time. In this work, a Priority Execution-based Approach for Detecting Android Malware (PEDAM) is proposed to solve some of these problems. In PEDAM, a two-phase dynamic analysis scheme is used for malware analysis. The first phase involves the use of a time-based filter for prioritizing the android application that will execute based on permissions and intents. Any suspected samples not captured in the first phase are further analysed in the second phase, which does behavioural analysis using Support Vector Machine classifier to analyse permissions, intent filters and Activity features set for effective detection. The evaluation of the proposed model on different Android malware families’ shows that PEDAM outperformed another android-based malware detection system known as Iterative Classifier Fusion System (ICFS) with improved accuracy of 1.04%. These results indicated that the approach could be deployed for detection of android malware.KeywordsMalware detectionAndroidDynamic analysisPermissionIntent filter

  • Research Article
  • Cite Count Icon 46
  • 10.1007/s11704-017-6493-y
Fingerprinting Android malware families
  • Jun 30, 2018
  • Frontiers of Computer Science
  • Nannan Xie + 3 more

The domination of the Android operating system in the market share of smart terminals has engendered increasing threats of malicious applications (apps). Research on Android malware detection has received considerable attention in academia and the industry. In particular, studies on malware families have been beneficial to malware detection and behavior analysis. However, identifying the characteristics of malware families and the features that can describe a particular family have been less frequently discussed in existing work. In this paper, we are motivated to explore the key features that can classify and describe the behaviors of Android malware families to enable fingerprinting the malware families with these features. We present a framework for signature-based key feature construction. In addition, we propose a frequency-based feature elimination algorithm to select the key features. Finally, we construct the fingerprints of ten malware families, including twenty key features in three categories. Results of extensive experiments using Support Vector Machine demonstrate that the malware family classification achieves an accuracy of 92% to 99%. The typical behaviors of malware families are analyzed based on the selected key features. The results demonstrate the feasibility and effectiveness of the presented algorithm and fingerprinting method.

  • Book Chapter
  • 10.1007/978-3-030-74664-3_4
Robust Android Malicious Community Fingerprinting
  • Jan 1, 2021
  • Elmouatez Billah Karbab + 3 more

Security practitioners can combat large-scale Android malware by decreasing the analysis window size of newly detected malware. The window starts from the first detection until signature generation by anti-malware vendors. The larger the window is, the more time the malicious apps are given to spread over the users’ devices. Current state-of-the-art techniques have a large analysis window due to the significant number of Android malware appearing daily. Besides, these techniques use manual analysis in some cases to investigate malware. Therefore, decreasing the need for manual detection could significantly reduce the analysis window. To address the aforementioned issue, we elaborate systematic techniques and tools for the detection of both known family apps and new malware family apps (i.e., variants of existing families or unseen malware). To do so, we rely on the assumption that any pair of Android apps, with distinct authors and certificates, are most likely to be malicious if they are highly similar. Because the adversary usually repackages multiple app packages with the same malicious payload to hide it from anti-malware and vetting systems. Consequently, it is difficult to detect such malicious payloads from benign functionalities of a given Android package. Accordingly, a pair of Android apps should not be very similar in their components, excluding popular libraries. This observation, as mentioned earlier, could be used to design and develop a security framework to detect Android malware apps.In this chapter, we propose a novel Android app fingerprinting technique, APK-DNA, inspired by fuzzy hashing. We specifically target fingerprinting Android malicious apps. Computing the APK-DNA of a suspicious app requires a low computation time. Afterward, we leverage the previously mentioned assumption (i.e., very similar apps might be malware from the same malware family) to propose a cyber-security framework, namely Cypider (Cyber-Spider for Android malware detection), to detect and cluster Android malware without prior- knowledge of Android malware apps. Cypider consists of a novel combination of a set of techniques to address the problem of Android malware, clustering, and fingerprinting. First, Cypider can detect repackaged malware (malware families), which constitute the vast majority of Android malware apps (Zhou and Jiang (Dissecting android malware: Characterization and evolution, in IEEE Symposium on Security and Privacy, SP 2012, 21–23 May 2012, San Francisco (2012), pp. 95–109)). Second, it can detect new malware apps, and more importantly, Cypider performs the detection automatically and in an unsupervised way (i.e., no prior-knowledge about the apps). The fundamental idea of Cypider relies on building a similarity network between the targeted apps static content in terms of fuzzy fingerprints. Actually, Cypider extracts, from this similarity network, sub-graphs with high connectivity, called communities, which are most likely to be malicious communities.

  • Research Article
  • Cite Count Icon 51
  • 10.1109/access.2019.2946392
A3CM: Automatic Capability Annotation for Android Malware
  • Jan 1, 2019
  • IEEE Access
  • Junyang Qiu + 6 more

Android malware poses serious security and privacy threats to the mobile users. Traditional malware detection and family classification technologies are becoming less effective due to the rapid evolution of the malware landscape, with the emerging of so-called zero-day-family malware families. To address this issue, our paper presents a novel research problem on automatically identifying the security/privacy-related capabilities of any detected malware, which we refer to as Malware Capability Annotation (MCA). Motivated by the observation that known and zero-day-family malware families share the security/privacy-related capabilities, MCA opens a new alternative way to effectively analyze zero-day-family malware (the malware that do not belong to any existing families) through exploring the related information and knowledge from known malware families. To address the MCA problem, we design a new MCA hunger solution, Automatic Capability Annotation for Android Malware (A3CM). A3CM works in the following four steps: 1) A3CM automatically extracts a set of semantic features such as permissions, API calls, network addresses from raw binary APKs to characterize malware samples; 2) A3CM applies a statistical embedding method to map the features into a joint feature space, so that malware samples can be represented as numerical vectors; 3) A3CM infers the malicious capabilities by using the multi-label classification model; 4) The trained multi-label model is used to annotate the malicious capabilities of the candidate malware samples. To facilitate the new research of MCA, we create a new ground truth dataset that consists of 6,899 annotated Android malware samples from 72 families. We carry out a large number of experiments based on the four representative security/privacy-related capabilities to evaluate the effectiveness of A3CM. Our results show that A3CM can achieve promising accuracy of 1.00, 0.98 and 0.63 in inferring multiple capabilities of known Android malware, small size-families' malware and zero-day-families' Android malware, respectively.

  • Research Article
  • Cite Count Icon 59
  • 10.1109/access.2020.2965646
Android Malware Familial Classification Based on DEX File Section Features
  • Jan 1, 2020
  • IEEE Access
  • Yong Fang + 3 more

The rapid proliferation of Android malware is challenging the classification of the Android malware family. The traditional static method for classification is easily affected by the confusion and reinforcement, while the dynamic method is expensive in computation. To solve these problems, this paper proposes an Android malware familial classification method based on Dalvik Executable (DEX) file section features. First, the DEX file is converted into RGB (Red/Green/Blue) image and plain text respectively, and then, the color and texture of image and text are extracted as features. Finally, a feature fusion algorithm based on multiple kernel learning is used for classification. In this experiment, the Android Malware Dataset (AMD) was selected as the sample set. Two different comparative experiments were set up, and the method in this paper was compared with the common visualization method and feature fusion method. The results show that our method has a better classification effect with precision, recall and F1 score reaching 0.96. Besides, the time of feature extraction in this paper is reduced by 2.999 seconds compared with the method of frequent subsequence. In conclusion, the method proposed in this paper is efficient and precise in the classification of the Android malware family.

Save Icon
Up Arrow
Open/Close