EC2: Ensemble Clustering and Classification for Predicting Android Malware Families

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

As the most widely used mobile platform, Android is also the biggest target for mobile malware. Given the increasing number of Android malware variants, detecting malware families is crucial so that security analysts can identify situations where signatures of a known malware family can be adapted as opposed to manually inspecting behavior of all samples. We present EC2 (Ensemble Clustering and Classification), a novel algorithm for discovering Android malware families of varying sizes-ranging from very large to very small families (even if previously unseen). We present a performance comparison of several traditional classification and clustering algorithms for Android malware family identification on DREBIN, the largest public Android malware dataset with labeled families. We use the output of both supervised classifiers and unsupervised clustering to design EC2. Experimental results on both the DREBIN and the more recent Koodous malware datasets show that EC2 accurately detects both small and large families, outperforming several comparative baselines. Furthermore, we show how to automatically characterize and explain unique behaviors of specific malware families, such as FakeInstaller, MobileTx, Geinimi. In short, EC2 presents an early warning system for emerging new malware families, as well as a robust predictor of the family (when it is not new) to which a new malware sample belongs, and the design of novel strategies for data-driven understanding of malware behaviors.

Similar Papers
  • Research Article
  • Cite Count Icon 21
  • 10.1109/tc.2022.3143439
Lightweight, Effective Detection and Characterization of Mobile Malware Families
  • Nov 1, 2022
  • IEEE Transactions on Computers
  • Karim O Elish + 2 more

Android malware is an ongoing threat to billions of smart devices’ security, ranging from mobile phones to car infotainment systems. Despite numerous approaches and previous studies to develop solutions for detecting and preventing Android malware, the rapid continuous development of new malware variants requires a careful reconsideration and the development of effective methods to identify malware families given a meager number of malware instances. In this paper, we present DroidMalVet, a novel Android malware family classification and detection approach that does not require to perform complex program analyses or utilize large feature sets. DroidMalVet is the first to use a promising, diverse, and small set of software metrics as features in a supervised learning platform to classify and detect various Android malware families. Our extensive empirical evaluations on two large public malware datasets show that DroidMalVet accurately detects both small and large malware families with F-Score accuracy of 94.4% and 96%, and AUC equal to 99.5% and 99.7% on the malware families in Drebin and AMD datasets, respectively. Moreover, our results demonstrate the superior performance of DroidMalVet in detecting small families (i.e., families with few samples). DroidMalVet complements existing approaches and presents an early warning tool for detecting known and emerging malware families.

  • Conference Article
  • Cite Count Icon 23
  • 10.1109/dsc.2017.77
Analysis of Android Malware Family Characteristic Based on Isomorphism of Sensitive API Call Graph
  • Jun 1, 2017
  • Hao Zhou + 3 more

The analysis of multiple Android malware families indicates malware instances within a common malware family always have similar call graph structures. Based on the isomorphism of sensitive API call graph, we propose a method which is used to construct malware family features via combining static analysis approach with graph similarity metric. The experiment is performed on a malware dataset which contains 1326 malware samples from 16 different malware families. The result shows that the method can differentiate distinct malware family features and divide suspect malware samples into corresponding families with a high accuracy of 96.77% overall and even defend a certain extent of obfuscation.

  • Conference Article
  • Cite Count Icon 23
  • 10.1109/bigdata47090.2019.9005669
Identifying Android Malware Families Using Android-Oriented Metrics
  • Dec 1, 2019
  • William Blanc + 3 more

Android malware (malicious apps) families share common attributes and behavior through sharing core malicious code. However, as the number of new malware increases, the task of identifying the correct family becomes more challenging. Two prominent approaches tackle this problem, either using dynamic analysis that captures the runtime behavior of the malware or using static analysis methods that can reveal malicious behavior by analyzing the underlying logic and code patterns. A third emerging way is to use the various sources of identification features to analyze the architectural and external attributes of a malicious app. For example, two malicious apps can have different behavioral patterns but share common attributes. We hypothesize that this malware can belong to the same family but attempt to mislead dynamic and code-level static analysis tools by randomizing their behavior. In this work, we utilize a promising set of Android-oriented code metrics that guide a supervised classification learning process for identifying malware families in Android. Our empirical results on 2,869 malware apps, across 35 different malware families, show that these metrics are very effective to identify malware families. In particular, we achieve low false positive rate (1.2%) and AUC score of 0.984 for family identification by using Random Forest (RF) classifier.

  • Research Article
  • Cite Count Icon 23
  • 10.1016/j.comnet.2020.107639
Comparative analysis of feature representations and machine learning methods in Android family classification
  • Oct 28, 2020
  • Computer Networks
  • Yude Bai + 4 more

Comparative analysis of feature representations and machine learning methods in Android family classification

  • Book Chapter
  • Cite Count Icon 25
  • 10.1007/978-3-319-24018-3_12
How Current Android Malware Seeks to Evade Automated Code Analysis
  • Jan 1, 2015
  • Siegfried Rasthofer + 3 more

First we report on a new threat campaign, underway in Korea, which infected around 20,000 Android users within two months. The campaign attacked mobile users with malicious applications spread via different channels, such as email attachments or SMS spam. A detailed investigation of the Android malware resulted in the identification of a new Android malware family Android/BadAccents. The family represents current state-of-the-art in mobile malware development for banking trojans. Second, we describe in detail the techniques this malware family uses and confront them with current state-of-the-art static and dynamic code-analysis techniques for Android applications. We highlight various challenges for automatic malware analysis frameworks that significantly hinder the fully automatic detection of malicious components in current Android malware. Furthermore, the malware exploits a previously unknown tapjacking vulnerability in the Android operating system, which we describe. As a result of this work, the vulnerability, affecting all Android versions, will be patched in one of the next releases of the Android Open Source Project.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 59
  • 10.3390/electronics9060942
Android Malware Family Classification and Analysis: Current Status and Future Directions
  • Jun 5, 2020
  • Electronics
  • Fahad Alswaina + 1 more

Android receives major attention from security practitioners and researchers due to the influx number of malicious applications. For the past twelve years, Android malicious applications have been grouped into families. In the research community, detecting new malware families is a challenge. As we investigate, most of the literature reviews focus on surveying malware detection. Characterizing the malware families can improve the detection process and understand the malware patterns. For this reason, we conduct a comprehensive survey on the state-of-the-art Android malware familial detection, identification, and categorization techniques. We categorize the literature based on three dimensions: type of analysis, features, and methodologies and techniques. Furthermore, we report the datasets that are commonly used. Finally, we highlight the limitations that we identify in the literature, challenges, and future research directions regarding the Android malware family.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 52
  • 10.1109/access.2019.2946392
A3CM: Automatic Capability Annotation for Android Malware
  • Jan 1, 2019
  • IEEE Access
  • Junyang Qiu + 6 more

Android malware poses serious security and privacy threats to the mobile users. Traditional malware detection and family classification technologies are becoming less effective due to the rapid evolution of the malware landscape, with the emerging of so-called zero-day-family malware families. To address this issue, our paper presents a novel research problem on automatically identifying the security/privacy-related capabilities of any detected malware, which we refer to as Malware Capability Annotation (MCA). Motivated by the observation that known and zero-day-family malware families share the security/privacy-related capabilities, MCA opens a new alternative way to effectively analyze zero-day-family malware (the malware that do not belong to any existing families) through exploring the related information and knowledge from known malware families. To address the MCA problem, we design a new MCA hunger solution, Automatic Capability Annotation for Android Malware (A3CM). A3CM works in the following four steps: 1) A3CM automatically extracts a set of semantic features such as permissions, API calls, network addresses from raw binary APKs to characterize malware samples; 2) A3CM applies a statistical embedding method to map the features into a joint feature space, so that malware samples can be represented as numerical vectors; 3) A3CM infers the malicious capabilities by using the multi-label classification model; 4) The trained multi-label model is used to annotate the malicious capabilities of the candidate malware samples. To facilitate the new research of MCA, we create a new ground truth dataset that consists of 6,899 annotated Android malware samples from 72 families. We carry out a large number of experiments based on the four representative security/privacy-related capabilities to evaluate the effectiveness of A3CM. Our results show that A3CM can achieve promising accuracy of 1.00, 0.98 and 0.63 in inferring multiple capabilities of known Android malware, small size-families' malware and zero-day-families' Android malware, respectively.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 59
  • 10.1109/access.2020.2965646
Android Malware Familial Classification Based on DEX File Section Features
  • Jan 1, 2020
  • IEEE Access
  • Yong Fang + 3 more

The rapid proliferation of Android malware is challenging the classification of the Android malware family. The traditional static method for classification is easily affected by the confusion and reinforcement, while the dynamic method is expensive in computation. To solve these problems, this paper proposes an Android malware familial classification method based on Dalvik Executable (DEX) file section features. First, the DEX file is converted into RGB (Red/Green/Blue) image and plain text respectively, and then, the color and texture of image and text are extracted as features. Finally, a feature fusion algorithm based on multiple kernel learning is used for classification. In this experiment, the Android Malware Dataset (AMD) was selected as the sample set. Two different comparative experiments were set up, and the method in this paper was compared with the common visualization method and feature fusion method. The results show that our method has a better classification effect with precision, recall and F1 score reaching 0.96. Besides, the time of feature extraction in this paper is reduced by 2.999 seconds compared with the method of frequent subsequence. In conclusion, the method proposed in this paper is efficient and precise in the classification of the Android malware family.

  • Research Article
  • Cite Count Icon 44
  • 10.17485/ijst/2016/v9i21/90273
System Call Analysis of Android Malware Families
  • Jun 20, 2016
  • Indian Journal of Science and Technology
  • Sapna Malik + 1 more

Background/Objectives: Now a days, Android Malware is coded so wisely that it has become very difficult to detect them. The static analysis of malicious code is not enough for detection of malware as this malware hides its method call in encrypted form or it can install the method at runtime. The system call tracing is an effective dynamic analysis technique for detecting malware as it can analyze the malware at the run time. Moreover, this technique does not require the application code for malware detection. Thus, this can detect that android malware also which are difficult to detect with static analysis of code. As Android was launched in 2008, so there were fewer studies available regarding the behavior of Android Malware Families and their characteristics. The aim of this work is to explore the behavior of 10 popular Android Malware Families focused on System Call Pattern of these families. Methods/Statistical Analysis: For this purpose, the authors have extracted the system call trace of 345 malicious applications from 10 Android Malware Families named FakeInstaller, Opfake, Plankton, DroidKungFu, BaseBridge, Iconosys, Kmin, Adrd and Gappusin using strace android tool and compared it with the system calls pattern of 300 Benign Applications to justify the behavior of malicious application. Findings: During the experiment, it is observed that the malicious applications invoke some system calls more frequently than benign applications. Different Android malware invokes the different set of system calls with different frequency. Applications/Improvements: This analysis can prove helpful in designing intrusion-detection systems for an android mobile device with more accuracy. Keywords: Android Kernal, Android Malware Installation Methods, Malware Families, System Call Analysis

  • Conference Article
  • 10.1109/prdc.2018.00047
ANTSdroid: Using RasMMA Algorithm to Generate Malware Behavior Characteristics of Android Malware Family
  • Dec 1, 2018
  • Shun-Chieh Chang + 5 more

Malware developers often use various obfuscation techniques to generate polymorphic and metamorphic versions of malicious programs. As a result, variants of a malware family generally exhibit resembling behavior, and most importantly, they possess certain common essential codes so to achieve the same designed purpose. Meantime, keeping up with new variants and generating signatures for each individual in a timely fashion has been costly and inefficient for anti-virus software companies. It motivates us the idea of no more dancing with variants. In this paper, we aim to find a malware family's main characteristic operations or activities directly related to its intent. We propose a novel automatic dynamic Android profiling system and malware family runtime behavior signature generation method called Runtime API sequence Motif Mining Algorithm (RasMMA) based on the analysis of the sensitive and permission-related execution traces of the threads and processes of a set of variant APKs of a malware family. We show the effectiveness of using the generated family signature to detect new variants using real-world dataset. Moreover, current anti-malware tools usually treat detection models as a black box for classification and offer little explanations on how malwares behave and how they proceed step by step to infiltrate targeted system and achieve the goal. We take malware family DroidKungFu as a case study to illustrate that the generated family signature indeed captures key malicious activities of the family.

  • Conference Article
  • Cite Count Icon 5
  • 10.1109/sin54109.2021.9699322
Android Malware Family Classification: What Works – API Calls, Permissions or API Packages?
  • Dec 15, 2021
  • Saurabh Kumar + 2 more

With the increased popularity and wide adoption of Android as a mobile OS platform, it has been a major target for malware authors. Due to unprecedented rapid growth in the number, variants, and diversity of malware, detecting malware on the Android platform has become challenging. Beyond the detection of a malware, classifying the family the malware belongs to, helps security analysts to reuse malware removal techniques that is known to work for that family of malware. It takes manual analysis if a malware belongs to an unknown family. Therefore, classifying malware into exact family is important. This paper presents a technique and tool named MAPFam that applies machine learning on static features from the Manifest file and API packages to classify an Android malware into its family. This work is premised on a starting hypothesis that features extracted from API packages rather than on API calls lead to more precise classification. Our experiments indeed shows that API package based models provides ∼1.63X more accurate classification compared to an API call based method. Our machine learning based malware family classification system uses API packages, requested permissions, and other features from the Manifest files. The proposed family classification system achieves accuracy and average precision above 97% for the top 60 malware families by using only 81 features with 97.55% of model reliability rate (Kappa score). The experimental results also shows that MAPFam can perfectly identity 36 malware families.

  • Research Article
  • Cite Count Icon 46
  • 10.1007/s11704-017-6493-y
Fingerprinting Android malware families
  • Jun 30, 2018
  • Frontiers of Computer Science
  • Nannan Xie + 3 more

The domination of the Android operating system in the market share of smart terminals has engendered increasing threats of malicious applications (apps). Research on Android malware detection has received considerable attention in academia and the industry. In particular, studies on malware families have been beneficial to malware detection and behavior analysis. However, identifying the characteristics of malware families and the features that can describe a particular family have been less frequently discussed in existing work. In this paper, we are motivated to explore the key features that can classify and describe the behaviors of Android malware families to enable fingerprinting the malware families with these features. We present a framework for signature-based key feature construction. In addition, we propose a frequency-based feature elimination algorithm to select the key features. Finally, we construct the fingerprints of ten malware families, including twenty key features in three categories. Results of extensive experiments using Support Vector Machine demonstrate that the malware family classification achieves an accuracy of 92% to 99%. The typical behaviors of malware families are analyzed based on the selected key features. The results demonstrate the feasibility and effectiveness of the presented algorithm and fingerprinting method.

  • PDF Download Icon
  • Research Article
  • 10.31470/2308-5126-2020-44-1-59-72
Experimental study of the influence of family structure on the development of creativity of children of preschool age
  • Nov 23, 2021
  • HUMANITARIUM
  • Iryna Zozulia + 1 more

The article presents theoretical substantiation and empirical research of the problem of the influence of the family structure on the development of creativity of children of preschool age. The relationship between creativity and family type, number of children in the family, birth order, and intervals between births is analyzed. Peculiarities of influence of family structure (by the number of children) for the development of creativity of children of preschool age are researched. The absence of significant differences in partial (productivity, flexibility, originality) and a general indicator of verbal creativity of children with one child, small and large families has been experimentally established. Research of figurative creativity allowed to identify significant differences in partial indicators of productivity and originality, and the general indicator of figurative creativity: the highest arithmetic mean values are determined in the group of children from small families, and the lowest - in the group of children from large families. In children brought up in single children families, the highest arithmetic mean value is revealed by the partial indicator of the name, and the lowest - in children from large families. In children of preschool age from single, small, and large families no significant differences by partial indicators of the development and resistance to the closure were found. The heterogeneity of verbal and figurative creativity structure is determined in children of preschool age in all types of families. Significant differences were found in the general indicators of creativity: the highest arithmetic mean value was determined in the group of children from small families, and the lowest - in children from large families. Conclusions are made that children from small families are the most creative, and children from large families - the least.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 28
  • 10.1109/access.2021.3139334
Efficient Deep Learning Network With Multi-Streams for Android Malware Family Classification
  • Jan 1, 2022
  • IEEE Access
  • Hyun-Il Kim + 3 more

It is important to effectively detect, mitigate, and defend against Android malware attacks, because Android malware has long represented a major threat to Android app security. Characterizing and classifying similar malicious apps into groups plays a particularly crucial role in building a secure Android app ecosystem. The classification of malware families can efficiently enhance the malware detection process and systematically elucidate malware patterns. In this paper, we propose a novel efficient deep learning network with multi-streams for Android malware family classification. We first obtain the input data for a convolutional neural network (CNN) in string format from some main files or sections contained in each Android malicious app. We then classify malware families by applying a 1-dimensional convolution filter-based network for the files or sections. Further, by using gradient analysis to visualize the important files and sections in malicious apps, we attempt to intuitively grasp which files or sections are the most significant for malware family classification. To validate the effectiveness of our approach, we conduct extensive experiments with the well-known DREBIN and AMD malware datasets, and we compare our approach with existing methods. Our experimental results show that the 1D CNN model is more accurate than the 2D CNN model, and that the <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">code_item</monospace> part in the classes.dex is the most relevant feature for malware classification, as it is more relevant than other parts such as AndroidManifest.xml and certificate. The proposed method achieves the best accuracy of 93.2% by using 1D convolution filters with multi-streams for the main files and sections of the malware samples.

  • Conference Article
  • Cite Count Icon 26
  • 10.1109/iscc47284.2019.8969656
Android Malware Family Classification Based on Sensitive Opcode Sequence
  • Jun 1, 2019
  • Jianguo Jiang + 7 more

Android malware family classification is an advanced task in Android malware analysis, detection and forensics. Existing methods and models have achieved a certain success for Android malware detection, but the accuracy and the efficiency are still not up to the expectation, especially in the context of multiple class classification with imbalanced training data. To address those challenges, we propose an Android malware family classification model by analyzing the code's specific semantic information based on sensitive opcode sequence. In this work, we construct a sensitive semantic feature-sensitive opcode sequence using opcodes, sensitive APIs, STRs and actions, and propose to analyze the code's specific semantic information, generate a semantic related vector for Android malware family classification based on this feature. Besides, aiming at the families with minority, we adopt an oversampling technique based on the sensitive opcode sequence. Finally, we evaluate our method on Drebin dataset, and select the top 40 malware families for experiments. The experimental results show that the Total Accuracy and Average AUC (Area Under Curve, AUC) reach 99.50% and 98.86% with 45. 17s per Android malware, and even if the number of malware families increases, these results remain good.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant