ANTSdroid: Using RasMMA Algorithm to Generate Malware Behavior Characteristics of Android Malware Family
Malware developers often use various obfuscation techniques to generate polymorphic and metamorphic versions of malicious programs. As a result, variants of a malware family generally exhibit resembling behavior, and most importantly, they possess certain common essential codes so to achieve the same designed purpose. Meantime, keeping up with new variants and generating signatures for each individual in a timely fashion has been costly and inefficient for anti-virus software companies. It motivates us the idea of no more dancing with variants. In this paper, we aim to find a malware family's main characteristic operations or activities directly related to its intent. We propose a novel automatic dynamic Android profiling system and malware family runtime behavior signature generation method called Runtime API sequence Motif Mining Algorithm (RasMMA) based on the analysis of the sensitive and permission-related execution traces of the threads and processes of a set of variant APKs of a malware family. We show the effectiveness of using the generated family signature to detect new variants using real-world dataset. Moreover, current anti-malware tools usually treat detection models as a black box for classification and offer little explanations on how malwares behave and how they proceed step by step to infiltrate targeted system and achieve the goal. We take malware family DroidKungFu as a case study to illustrate that the generated family signature indeed captures key malicious activities of the family.
- Research Article
120
- 10.1109/tdsc.2017.2739145
- Oct 23, 2019
- IEEE Transactions on Dependable and Secure Computing
As the most widely used mobile platform, Android is also the biggest target for mobile malware. Given the increasing number of Android malware variants, detecting malware families is crucial so that security analysts can identify situations where signatures of a known malware family can be adapted as opposed to manually inspecting behavior of all samples. We present EC2 (Ensemble Clustering and Classification), a novel algorithm for discovering Android malware families of varying sizes-ranging from very large to very small families (even if previously unseen). We present a performance comparison of several traditional classification and clustering algorithms for Android malware family identification on DREBIN, the largest public Android malware dataset with labeled families. We use the output of both supervised classifiers and unsupervised clustering to design EC2. Experimental results on both the DREBIN and the more recent Koodous malware datasets show that EC2 accurately detects both small and large families, outperforming several comparative baselines. Furthermore, we show how to automatically characterize and explain unique behaviors of specific malware families, such as FakeInstaller, MobileTx, Geinimi. In short, EC2 presents an early warning system for emerging new malware families, as well as a robust predictor of the family (when it is not new) to which a new malware sample belongs, and the design of novel strategies for data-driven understanding of malware behaviors.
- Research Article
21
- 10.1109/tc.2022.3143439
- Nov 1, 2022
- IEEE Transactions on Computers
Android malware is an ongoing threat to billions of smart devices’ security, ranging from mobile phones to car infotainment systems. Despite numerous approaches and previous studies to develop solutions for detecting and preventing Android malware, the rapid continuous development of new malware variants requires a careful reconsideration and the development of effective methods to identify malware families given a meager number of malware instances. In this paper, we present DroidMalVet, a novel Android malware family classification and detection approach that does not require to perform complex program analyses or utilize large feature sets. DroidMalVet is the first to use a promising, diverse, and small set of software metrics as features in a supervised learning platform to classify and detect various Android malware families. Our extensive empirical evaluations on two large public malware datasets show that DroidMalVet accurately detects both small and large malware families with F-Score accuracy of 94.4% and 96%, and AUC equal to 99.5% and 99.7% on the malware families in Drebin and AMD datasets, respectively. Moreover, our results demonstrate the superior performance of DroidMalVet in detecting small families (i.e., families with few samples). DroidMalVet complements existing approaches and presents an early warning tool for detecting known and emerging malware families.
- Conference Article
23
- 10.1109/bigdata47090.2019.9005669
- Dec 1, 2019
Android malware (malicious apps) families share common attributes and behavior through sharing core malicious code. However, as the number of new malware increases, the task of identifying the correct family becomes more challenging. Two prominent approaches tackle this problem, either using dynamic analysis that captures the runtime behavior of the malware or using static analysis methods that can reveal malicious behavior by analyzing the underlying logic and code patterns. A third emerging way is to use the various sources of identification features to analyze the architectural and external attributes of a malicious app. For example, two malicious apps can have different behavioral patterns but share common attributes. We hypothesize that this malware can belong to the same family but attempt to mislead dynamic and code-level static analysis tools by randomizing their behavior. In this work, we utilize a promising set of Android-oriented code metrics that guide a supervised classification learning process for identifying malware families in Android. Our empirical results on 2,869 malware apps, across 35 different malware families, show that these metrics are very effective to identify malware families. In particular, we achieve low false positive rate (1.2%) and AUC score of 0.984 for family identification by using Random Forest (RF) classifier.
- Book Chapter
25
- 10.1007/978-3-319-24018-3_12
- Jan 1, 2015
First we report on a new threat campaign, underway in Korea, which infected around 20,000 Android users within two months. The campaign attacked mobile users with malicious applications spread via different channels, such as email attachments or SMS spam. A detailed investigation of the Android malware resulted in the identification of a new Android malware family Android/BadAccents. The family represents current state-of-the-art in mobile malware development for banking trojans. Second, we describe in detail the techniques this malware family uses and confront them with current state-of-the-art static and dynamic code-analysis techniques for Android applications. We highlight various challenges for automatic malware analysis frameworks that significantly hinder the fully automatic detection of malicious components in current Android malware. Furthermore, the malware exploits a previously unknown tapjacking vulnerability in the Android operating system, which we describe. As a result of this work, the vulnerability, affecting all Android versions, will be patched in one of the next releases of the Android Open Source Project.
- Conference Article
23
- 10.1109/dsc.2017.77
- Jun 1, 2017
The analysis of multiple Android malware families indicates malware instances within a common malware family always have similar call graph structures. Based on the isomorphism of sensitive API call graph, we propose a method which is used to construct malware family features via combining static analysis approach with graph similarity metric. The experiment is performed on a malware dataset which contains 1326 malware samples from 16 different malware families. The result shows that the method can differentiate distinct malware family features and divide suspect malware samples into corresponding families with a high accuracy of 96.77% overall and even defend a certain extent of obfuscation.
- Research Article
46
- 10.1007/s11704-017-6493-y
- Jun 30, 2018
- Frontiers of Computer Science
The domination of the Android operating system in the market share of smart terminals has engendered increasing threats of malicious applications (apps). Research on Android malware detection has received considerable attention in academia and the industry. In particular, studies on malware families have been beneficial to malware detection and behavior analysis. However, identifying the characteristics of malware families and the features that can describe a particular family have been less frequently discussed in existing work. In this paper, we are motivated to explore the key features that can classify and describe the behaviors of Android malware families to enable fingerprinting the malware families with these features. We present a framework for signature-based key feature construction. In addition, we propose a frequency-based feature elimination algorithm to select the key features. Finally, we construct the fingerprints of ten malware families, including twenty key features in three categories. Results of extensive experiments using Support Vector Machine demonstrate that the malware family classification achieves an accuracy of 92% to 99%. The typical behaviors of malware families are analyzed based on the selected key features. The results demonstrate the feasibility and effectiveness of the presented algorithm and fingerprinting method.
- Research Article
59
- 10.3390/electronics9060942
- Jun 5, 2020
- Electronics
Android receives major attention from security practitioners and researchers due to the influx number of malicious applications. For the past twelve years, Android malicious applications have been grouped into families. In the research community, detecting new malware families is a challenge. As we investigate, most of the literature reviews focus on surveying malware detection. Characterizing the malware families can improve the detection process and understand the malware patterns. For this reason, we conduct a comprehensive survey on the state-of-the-art Android malware familial detection, identification, and categorization techniques. We categorize the literature based on three dimensions: type of analysis, features, and methodologies and techniques. Furthermore, we report the datasets that are commonly used. Finally, we highlight the limitations that we identify in the literature, challenges, and future research directions regarding the Android malware family.
- Research Article
56
- 10.1109/tcyb.2022.3164625
- Jan 1, 2023
- IEEE Transactions on Cybernetics
Evolving Android malware poses a severe security threat to mobile users, and machine-learning (ML)-based defense techniques attract active research. Due to the lack of knowledge, many zero-day families' malware may remain undetected until the classifier gains specialized knowledge. The most existing ML-based methods will take a long time to learn new malware families in the latest malware family landscape. Existing ML-based Android malware detection and classification methods struggle with the fast evolution of the malware landscape, particularly in terms of the emergence of zero-day malware families and limited representation of single-view features. In this article, a new multiview feature intelligence (MFI) framework is developed to learn the representation of a targeted capability from known malware families for recognizing unknown and evolving malware with the same capability. The new framework performs reverse engineering to extract multiview heterogeneous features, including semantic string features, API call graph features, and smali opcode sequential features. It can learn the representation of a targeted capability from known malware families through a series of processes of feature analysis, selection, aggregation, and encoding, to detect unknown Android malware with shared target capability. We create a new dataset with ground-truth information regarding capability. Many experiments are conducted on the new dataset to evaluate the performance and effectiveness of the new method. The results demonstrate that the new method outperforms three state-of-the-art methods, including: 1) Drebin; 2) MaMaDroid; and 3) N -opcode, when detecting unknown Android malware with targeted capabilities.
- Conference Article
26
- 10.1109/iscc47284.2019.8969656
- Jun 1, 2019
Android malware family classification is an advanced task in Android malware analysis, detection and forensics. Existing methods and models have achieved a certain success for Android malware detection, but the accuracy and the efficiency are still not up to the expectation, especially in the context of multiple class classification with imbalanced training data. To address those challenges, we propose an Android malware family classification model by analyzing the code's specific semantic information based on sensitive opcode sequence. In this work, we construct a sensitive semantic feature-sensitive opcode sequence using opcodes, sensitive APIs, STRs and actions, and propose to analyze the code's specific semantic information, generate a semantic related vector for Android malware family classification based on this feature. Besides, aiming at the families with minority, we adopt an oversampling technique based on the sensitive opcode sequence. Finally, we evaluate our method on Drebin dataset, and select the top 40 malware families for experiments. The experimental results show that the Total Accuracy and Average AUC (Area Under Curve, AUC) reach 99.50% and 98.86% with 45. 17s per Android malware, and even if the number of malware families increases, these results remain good.
- Research Article
113
- 10.1016/j.cose.2021.102399
- Jul 9, 2021
- Computers & Security
KronoDroid: Time-based Hybrid-featured Dataset for Effective Android Malware Detection and Characterization
- Research Article
52
- 10.1109/access.2019.2946392
- Jan 1, 2019
- IEEE Access
Android malware poses serious security and privacy threats to the mobile users. Traditional malware detection and family classification technologies are becoming less effective due to the rapid evolution of the malware landscape, with the emerging of so-called zero-day-family malware families. To address this issue, our paper presents a novel research problem on automatically identifying the security/privacy-related capabilities of any detected malware, which we refer to as Malware Capability Annotation (MCA). Motivated by the observation that known and zero-day-family malware families share the security/privacy-related capabilities, MCA opens a new alternative way to effectively analyze zero-day-family malware (the malware that do not belong to any existing families) through exploring the related information and knowledge from known malware families. To address the MCA problem, we design a new MCA hunger solution, Automatic Capability Annotation for Android Malware (A3CM). A3CM works in the following four steps: 1) A3CM automatically extracts a set of semantic features such as permissions, API calls, network addresses from raw binary APKs to characterize malware samples; 2) A3CM applies a statistical embedding method to map the features into a joint feature space, so that malware samples can be represented as numerical vectors; 3) A3CM infers the malicious capabilities by using the multi-label classification model; 4) The trained multi-label model is used to annotate the malicious capabilities of the candidate malware samples. To facilitate the new research of MCA, we create a new ground truth dataset that consists of 6,899 annotated Android malware samples from 72 families. We carry out a large number of experiments based on the four representative security/privacy-related capabilities to evaluate the effectiveness of A3CM. Our results show that A3CM can achieve promising accuracy of 1.00, 0.98 and 0.63 in inferring multiple capabilities of known Android malware, small size-families' malware and zero-day-families' Android malware, respectively.
- Book Chapter
- 10.1007/978-3-030-74664-3_4
- Jan 1, 2021
Security practitioners can combat large-scale Android malware by decreasing the analysis window size of newly detected malware. The window starts from the first detection until signature generation by anti-malware vendors. The larger the window is, the more time the malicious apps are given to spread over the users’ devices. Current state-of-the-art techniques have a large analysis window due to the significant number of Android malware appearing daily. Besides, these techniques use manual analysis in some cases to investigate malware. Therefore, decreasing the need for manual detection could significantly reduce the analysis window. To address the aforementioned issue, we elaborate systematic techniques and tools for the detection of both known family apps and new malware family apps (i.e., variants of existing families or unseen malware). To do so, we rely on the assumption that any pair of Android apps, with distinct authors and certificates, are most likely to be malicious if they are highly similar. Because the adversary usually repackages multiple app packages with the same malicious payload to hide it from anti-malware and vetting systems. Consequently, it is difficult to detect such malicious payloads from benign functionalities of a given Android package. Accordingly, a pair of Android apps should not be very similar in their components, excluding popular libraries. This observation, as mentioned earlier, could be used to design and develop a security framework to detect Android malware apps.In this chapter, we propose a novel Android app fingerprinting technique, APK-DNA, inspired by fuzzy hashing. We specifically target fingerprinting Android malicious apps. Computing the APK-DNA of a suspicious app requires a low computation time. Afterward, we leverage the previously mentioned assumption (i.e., very similar apps might be malware from the same malware family) to propose a cyber-security framework, namely Cypider (Cyber-Spider for Android malware detection), to detect and cluster Android malware without prior- knowledge of Android malware apps. Cypider consists of a novel combination of a set of techniques to address the problem of Android malware, clustering, and fingerprinting. First, Cypider can detect repackaged malware (malware families), which constitute the vast majority of Android malware apps (Zhou and Jiang (Dissecting android malware: Characterization and evolution, in IEEE Symposium on Security and Privacy, SP 2012, 21–23 May 2012, San Francisco (2012), pp. 95–109)). Second, it can detect new malware apps, and more importantly, Cypider performs the detection automatically and in an unsupervised way (i.e., no prior-knowledge about the apps). The fundamental idea of Cypider relies on building a similarity network between the targeted apps static content in terms of fuzzy fingerprints. Actually, Cypider extracts, from this similarity network, sub-graphs with high connectivity, called communities, which are most likely to be malicious communities.
- Research Article
246
- 10.1016/j.eswa.2013.07.106
- Aug 13, 2013
- Expert Systems with Applications
Dendroid: A text mining approach to analyzing and classifying code structures in Android malware families
- Research Article
28
- 10.1109/access.2021.3139334
- Jan 1, 2022
- IEEE Access
It is important to effectively detect, mitigate, and defend against Android malware attacks, because Android malware has long represented a major threat to Android app security. Characterizing and classifying similar malicious apps into groups plays a particularly crucial role in building a secure Android app ecosystem. The classification of malware families can efficiently enhance the malware detection process and systematically elucidate malware patterns. In this paper, we propose a novel efficient deep learning network with multi-streams for Android malware family classification. We first obtain the input data for a convolutional neural network (CNN) in string format from some main files or sections contained in each Android malicious app. We then classify malware families by applying a 1-dimensional convolution filter-based network for the files or sections. Further, by using gradient analysis to visualize the important files and sections in malicious apps, we attempt to intuitively grasp which files or sections are the most significant for malware family classification. To validate the effectiveness of our approach, we conduct extensive experiments with the well-known DREBIN and AMD malware datasets, and we compare our approach with existing methods. Our experimental results show that the 1D CNN model is more accurate than the 2D CNN model, and that the <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">code_item</monospace> part in the classes.dex is the most relevant feature for malware classification, as it is more relevant than other parts such as AndroidManifest.xml and certificate. The proposed method achieves the best accuracy of 93.2% by using 1D convolution filters with multi-streams for the main files and sections of the malware samples.
- Dissertation
3
- 10.32657/10356/72122
- Jan 1, 2017
Android has grown to be the most popular mobile operating system since its release in 2008.Due to its openness and ease of use, it attracts thousands of vendors and developers working on Android application development.Millions of apps provide a variety of functionalities to Android users, such as online shopping, instant messaging, gaming and map service.However, Android becomes a hot attack target of cybercriminals due to its prevalence.According to the security report of Symantec in 2016, the number of Android malware has reached 13 million in 2015.Android malware is uploaded into either Google official market or unofficial markets everyday by cybercriminals which put users under a high risk.The malware may steal users' sensitive information, elevate the privilege, remote control devices, and encrypt users' files for ransom.It is non-trivial to understand the risks and develop effective mitigation against them.Malware is the critical and non-trivial issue in Android security.In order to prevent malware from attacking the users, we need a better understanding of Android malware and its behaviors, which can facilitate the extraction of representative features from malware, and thereby enhance malware detection.The malware and anti-malware tools are keeping evolving during the process of competition.Therefore, it is valuable to learn the characteristics of evolving malware, and weakness of existing anti-malware tools.Moreover, a sustaining malware analysis and security assessment is lacking for the Android world.In order to address these problems, we propose a semantic based malware analysis on these topics with the following achievements in this thesis:1. We propose a precise semantic model of Android malware based on Deterministic Symbolic Automaton (DSA) for the purpose of malware comprehension, detection and classification.Based on DSA, we develop an automatic analysis framework, named SMART, which learns DSA by detecting and summarizing semantic clones from malware families, and then extracts semantic features from the learned DSA to classify malware according to the attack patterns.We conduct the experiments in both malware benchmark and 223,170 real-world apps.The results show that SMART builds meaningful semantic models and outperforms both state-of-the-art approaches and anti-virus tools in malware detection.SMART identifies 4583 new malware in real-world apps that are missed by most anti-virus tools.The classification step further identifies new malware variants and unknown families.iv 2. We first propose a meta model for Android malware to capture the common attack features and evasion features in the malware.Based on this model, we develop a framework, MYSTIQUE, to automatically generate malware covering four attack features and two evasion features, by adopting the software product line engineering approach.With the help of MYSTIQUE, we conduct experiments to 1) understand Android malware and the associated attack features as well as evasion techniques; 2) evaluate and compare the 57 off-the-shelf anti-malware tools, 9 academic solutions and 4 Android market vetting processes in terms of accuracy in detecting attack features and capability in addressing evasion.Last but not least, we provide a benchmark of Android malware with proper labeling of contained attack and evasion features.Moreover, we extend this work to MYSTIQUE-S to explore the capabilities of anti-malware tools detecting malware with dynamic code loading.MYSTIQUE-S automatically selects attack features under various user scenarios and delivers the corresponding malicious payloads at runtime.Relying on dynamic code binding (via service) and loading (via reflection) techniques, MYSTIQUE-S enables the dynamic execution of payloads on user devices at runtime.Experimental results on real-world devices show that existing Anti-Malware Tools (AMTs) are incapable of detecting most of our generated malware.Last, we propose some enhancements for existing anti-malware tools.3. We propose a systematic approach to study Android malware, unveil security issues, obtain insightful conclusions and highlights, and predict the future trend for research.We have collected 4,267,178 Android apps from a variety of Android marketplaces, where 1,004,550 malware variants are identified and analyzed.Different from previous works, this work focuses on the differences and evolution of apps' characteristics, and identifies multiple security-related issues concerned by both academia and industry.In order to provide a comprehensive view for these issues, we propose four analyses on individual app, malware family, malware author, and market, to conduct our study and guide the analysis.Furthermore, we propose six dimensions to cluster apps for different analysis tasks to achieve efficiency and accuracy in the large-scale analysis.Some of the key findings reflect the characteristics of attacks, and the weaknesses in protection, which can benefit all stakeholders.x