Identifying Android Malware Families Using Android-Oriented Metrics

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Android malware (malicious apps) families share common attributes and behavior through sharing core malicious code. However, as the number of new malware increases, the task of identifying the correct family becomes more challenging. Two prominent approaches tackle this problem, either using dynamic analysis that captures the runtime behavior of the malware or using static analysis methods that can reveal malicious behavior by analyzing the underlying logic and code patterns. A third emerging way is to use the various sources of identification features to analyze the architectural and external attributes of a malicious app. For example, two malicious apps can have different behavioral patterns but share common attributes. We hypothesize that this malware can belong to the same family but attempt to mislead dynamic and code-level static analysis tools by randomizing their behavior. In this work, we utilize a promising set of Android-oriented code metrics that guide a supervised classification learning process for identifying malware families in Android. Our empirical results on 2,869 malware apps, across 35 different malware families, show that these metrics are very effective to identify malware families. In particular, we achieve low false positive rate (1.2%) and AUC score of 0.984 for family identification by using Random Forest (RF) classifier.

Similar Papers
  • Book Chapter
  • 10.1007/978-3-030-74664-3_4
Robust Android Malicious Community Fingerprinting
  • Jan 1, 2021
  • Elmouatez Billah Karbab + 3 more

Security practitioners can combat large-scale Android malware by decreasing the analysis window size of newly detected malware. The window starts from the first detection until signature generation by anti-malware vendors. The larger the window is, the more time the malicious apps are given to spread over the users’ devices. Current state-of-the-art techniques have a large analysis window due to the significant number of Android malware appearing daily. Besides, these techniques use manual analysis in some cases to investigate malware. Therefore, decreasing the need for manual detection could significantly reduce the analysis window. To address the aforementioned issue, we elaborate systematic techniques and tools for the detection of both known family apps and new malware family apps (i.e., variants of existing families or unseen malware). To do so, we rely on the assumption that any pair of Android apps, with distinct authors and certificates, are most likely to be malicious if they are highly similar. Because the adversary usually repackages multiple app packages with the same malicious payload to hide it from anti-malware and vetting systems. Consequently, it is difficult to detect such malicious payloads from benign functionalities of a given Android package. Accordingly, a pair of Android apps should not be very similar in their components, excluding popular libraries. This observation, as mentioned earlier, could be used to design and develop a security framework to detect Android malware apps.In this chapter, we propose a novel Android app fingerprinting technique, APK-DNA, inspired by fuzzy hashing. We specifically target fingerprinting Android malicious apps. Computing the APK-DNA of a suspicious app requires a low computation time. Afterward, we leverage the previously mentioned assumption (i.e., very similar apps might be malware from the same malware family) to propose a cyber-security framework, namely Cypider (Cyber-Spider for Android malware detection), to detect and cluster Android malware without prior- knowledge of Android malware apps. Cypider consists of a novel combination of a set of techniques to address the problem of Android malware, clustering, and fingerprinting. First, Cypider can detect repackaged malware (malware families), which constitute the vast majority of Android malware apps (Zhou and Jiang (Dissecting android malware: Characterization and evolution, in IEEE Symposium on Security and Privacy, SP 2012, 21–23 May 2012, San Francisco (2012), pp. 95–109)). Second, it can detect new malware apps, and more importantly, Cypider performs the detection automatically and in an unsupervised way (i.e., no prior-knowledge about the apps). The fundamental idea of Cypider relies on building a similarity network between the targeted apps static content in terms of fuzzy fingerprints. Actually, Cypider extracts, from this similarity network, sub-graphs with high connectivity, called communities, which are most likely to be malicious communities.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 28
  • 10.1109/access.2021.3139334
Efficient Deep Learning Network With Multi-Streams for Android Malware Family Classification
  • Jan 1, 2022
  • IEEE Access
  • Hyun-Il Kim + 3 more

It is important to effectively detect, mitigate, and defend against Android malware attacks, because Android malware has long represented a major threat to Android app security. Characterizing and classifying similar malicious apps into groups plays a particularly crucial role in building a secure Android app ecosystem. The classification of malware families can efficiently enhance the malware detection process and systematically elucidate malware patterns. In this paper, we propose a novel efficient deep learning network with multi-streams for Android malware family classification. We first obtain the input data for a convolutional neural network (CNN) in string format from some main files or sections contained in each Android malicious app. We then classify malware families by applying a 1-dimensional convolution filter-based network for the files or sections. Further, by using gradient analysis to visualize the important files and sections in malicious apps, we attempt to intuitively grasp which files or sections are the most significant for malware family classification. To validate the effectiveness of our approach, we conduct extensive experiments with the well-known DREBIN and AMD malware datasets, and we compare our approach with existing methods. Our experimental results show that the 1D CNN model is more accurate than the 2D CNN model, and that the <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">code_item</monospace> part in the classes.dex is the most relevant feature for malware classification, as it is more relevant than other parts such as AndroidManifest.xml and certificate. The proposed method achieves the best accuracy of 93.2% by using 1D convolution filters with multi-streams for the main files and sections of the malware samples.

  • Research Article
  • Cite Count Icon 120
  • 10.1109/tdsc.2017.2739145
EC2: Ensemble Clustering and Classification for Predicting Android Malware Families
  • Oct 23, 2019
  • IEEE Transactions on Dependable and Secure Computing
  • Tanmoy Chakraborty + 2 more

As the most widely used mobile platform, Android is also the biggest target for mobile malware. Given the increasing number of Android malware variants, detecting malware families is crucial so that security analysts can identify situations where signatures of a known malware family can be adapted as opposed to manually inspecting behavior of all samples. We present EC2 (Ensemble Clustering and Classification), a novel algorithm for discovering Android malware families of varying sizes-ranging from very large to very small families (even if previously unseen). We present a performance comparison of several traditional classification and clustering algorithms for Android malware family identification on DREBIN, the largest public Android malware dataset with labeled families. We use the output of both supervised classifiers and unsupervised clustering to design EC2. Experimental results on both the DREBIN and the more recent Koodous malware datasets show that EC2 accurately detects both small and large families, outperforming several comparative baselines. Furthermore, we show how to automatically characterize and explain unique behaviors of specific malware families, such as FakeInstaller, MobileTx, Geinimi. In short, EC2 presents an early warning system for emerging new malware families, as well as a robust predictor of the family (when it is not new) to which a new malware sample belongs, and the design of novel strategies for data-driven understanding of malware behaviors.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-3-319-98776-7_41
Two-Phases Detection Scheme: Detecting Android Malware in Android Markets
  • Nov 5, 2018
  • Xin Su + 3 more

Recently, Android application becomes popular and important in human’s daily work, life, entertainment. However, because of open source of Android application, more and more malware aim to this platform and launch various malicious attacks to threaten Android users’ security. Previous research works focus on using static behavioral analysis to detect Android malware, which cannot capture dynamic behaviors and in-efficiency to detect Android malware. In this paper, we present a Android application two-stage detection scheme that using two kinds of dynamic behavioral characteristics to detect Android malware. This framework first uses system call statistics to identify potential malicious apps. After verification, if the software is clean, the application will then be released to the relevant markets. To mitigate against false negative cases, users who run new apps can invoke our network traffic monitoring (NTM) tool which triggers network traffic capture upon detecting some suspicious behaviors e.g. detecting sensitive data being sent to output stream of an open socket. The network traffic will be analyzed to see if it matches network characteristics observed from malware apps. If suspicious network traffic is observed, the relevant Android markets will be notified to remove the application from the repository. We trained our system call and network traffic classifiers using 32 families of known Android malware families and some typical normal apps. Later, we evaluated our framework using other malware and normal apps that used in the training set. Our experimental results using 120 test apps (which consist of 50 malware and 70 normal apps) indicate that we can achieve a 94.2% and 99.2% accuracy with J.48 and Random forest classifier respectively using our framework.

  • Conference Article
  • Cite Count Icon 20
  • 10.1109/cns48642.2020.9162341
Hybrid Analysis of Android Apps for Security Vetting using Deep Learning
  • Jun 1, 2020
  • Dewan Chaulagain + 6 more

The phenomenal growth in use of android devices in the recent years has also been accompanied by the rise of android malware. This reality warrants development of tools and techniques to analyze android apps in large scale for security vetting. Most of the state-of-the-art vetting tools are either based on static analysis or on dynamic analysis. Static analysis has limited success if the malware app utilizes sophisticated evading tricks. Dynamic analysis on the other hand may not find all the code execution paths, which let some malware apps remain undetected. Moreover, the existing static and dynamic analysis vetting techniques require extensive human interaction. To address the above issues, we design a deep learning based hybrid analysis technique, which combines the complementary strengths of each analysis paradigm to attain better accuracy. Moreover, automated feature engineering capability of the deep learning framework addresses the human interaction issue. In particular, using lightweight static and dynamic analysis procedure, we obtain multiple artifacts, and with these artifacts we train the deep learner to create independent models, and then combine them to build a hybrid classifier to obtain the final vetting decision (malicious apps vs. benign apps). The experiments show that our best deep learning model with hybrid analysis achieves an area under the precision-recall curve (AUC) of 0.9998. In this paper, we also present a comparative study of performance measures of the various variants of the deep learning framework. Additional experiments indicate that our vetting system is fairly robust against imbalanced data and is scalable.

  • Book Chapter
  • Cite Count Icon 16
  • 10.1016/bs.adcom.2020.03.002
Effectiveness of state-of-the-art dynamic analysis techniques in identifying diverse Android malware and future enhancements
  • Jan 1, 2020
  • Jyoti Gajrani + 6 more

Effectiveness of state-of-the-art dynamic analysis techniques in identifying diverse Android malware and future enhancements

  • Research Article
  • Cite Count Icon 21
  • 10.1109/tc.2022.3143439
Lightweight, Effective Detection and Characterization of Mobile Malware Families
  • Nov 1, 2022
  • IEEE Transactions on Computers
  • Karim O Elish + 2 more

Android malware is an ongoing threat to billions of smart devices’ security, ranging from mobile phones to car infotainment systems. Despite numerous approaches and previous studies to develop solutions for detecting and preventing Android malware, the rapid continuous development of new malware variants requires a careful reconsideration and the development of effective methods to identify malware families given a meager number of malware instances. In this paper, we present DroidMalVet, a novel Android malware family classification and detection approach that does not require to perform complex program analyses or utilize large feature sets. DroidMalVet is the first to use a promising, diverse, and small set of software metrics as features in a supervised learning platform to classify and detect various Android malware families. Our extensive empirical evaluations on two large public malware datasets show that DroidMalVet accurately detects both small and large malware families with F-Score accuracy of 94.4% and 96%, and AUC equal to 99.5% and 99.7% on the malware families in Drebin and AMD datasets, respectively. Moreover, our results demonstrate the superior performance of DroidMalVet in detecting small families (i.e., families with few samples). DroidMalVet complements existing approaches and presents an early warning tool for detecting known and emerging malware families.

  • Research Article
  • Cite Count Icon 113
  • 10.1016/j.cose.2021.102399
KronoDroid: Time-based Hybrid-featured Dataset for Effective Android Malware Detection and Characterization
  • Jul 9, 2021
  • Computers &amp; Security
  • Alejandro Guerra-Manzanares + 2 more

KronoDroid: Time-based Hybrid-featured Dataset for Effective Android Malware Detection and Characterization

  • Research Article
  • Cite Count Icon 49
  • 10.3390/e23081009
A Hybrid Analysis-Based Approach to Android Malware Family Classification.
  • Aug 3, 2021
  • Entropy
  • Chao Ding + 3 more

With the popularity of Android, malware detection and family classification have also become a research focus. Many excellent methods have been proposed by previous authors, but static and dynamic analyses inevitably require complex processes. A hybrid analysis method for detecting Android malware and classifying malware families is presented in this paper, and is partially optimized for multiple-feature data. For static analysis, we use permissions and intent as static features and use three feature selection methods to form a subset of three candidate features. Compared with various models, including k-nearest neighbors and random forest, random forest is the best, with a detection rate of 95.04%, while the chi-square test is the best feature selection method. After using feature selection to explore the critical static features contained in this dataset, we analyzed a subset of important features to gain more insight into the malware. In a dynamic analysis based on network traffic, unlike those that focus on a one-way flow of traffic and work on HTTP protocols and transport layer protocols, we focused on sessions and retained protocol layers. The Res7LSTM model is then used to further classify the malicious and partially benign samples detected in the static detection. The experimental results show that our approach can not only work with fewer static features and guarantee sufficient accuracy, but also improve the detection rate of Android malware family classification from 71.48% in previous work to 99% when cutting the traffic in terms of the sessions and protocols of all layers.

  • Conference Article
  • Cite Count Icon 50
  • 10.1109/pst.2018.8514191
A Family of Droids-Android Malware Detection via Behavioral Modeling: Static vs Dynamic Analysis
  • Aug 1, 2018
  • Lucky Onwuzurike + 5 more

Following the increasing popularity of the mobile ecosystem, cybercriminals have increasingly targeted mobile ecosystems, designing and distributing malicious apps that steal information or cause harm to the device's owner. Aiming to counter them, detection techniques based on either static or dynamic analysis that model Android malware, have been proposed. While the pros and cons of these analysis techniques are known, they are usually compared in the context of their limitations e.g., static analysis is not able to capture runtime behaviors, full code coverage is usually not achieved during dynamic analysis, etc. Whereas, in this paper, we analyze the performance of static and dynamic analysis methods in the detection of Android malware and attempt to compare them in terms of their detection performance, using the same modeling approach.To this end, we build on MAMADROID, a state-of-the-art detection system that relies on static analysis to create a behavioral model from the sequences of abstracted API calls. Then, aiming to apply the same technique in a dynamic analysis setting, we modify CHIMP, a platform recently proposed to crowdsource human inputs for app testing, in order to extract API calls' sequences from the traces produced while executing the app on a CHIMP virtual device. We call this system AUNTIEDROID and instantiate it by using both automated (Monkey) and usergenerated inputs. We find that combining both static and dynamic analysis yields the best performance, with $F -$measure reaching 0.92. We also show that static analysis is at least as effective as dynamic analysis, depending on how apps are stimulated during execution, and investigate the reasons for inconsistent misclassifications across methods.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 48
  • 10.3390/app13137720
DroidDetectMW: A Hybrid Intelligent Model for Android Malware Detection
  • Jun 29, 2023
  • Applied Sciences
  • Fatma Taher + 4 more

Malicious apps specifically aimed at the Android platform have increased in tandem with the proliferation of mobile devices. Malware is now so carefully written that it is difficult to detect. Due to the exponential growth in malware, manual methods of malware are increasingly ineffective. Although prior writers have proposed numerous high-quality approaches, static and dynamic assessments inherently necessitate intricate procedures. The obfuscation methods used by modern malware are incredibly complex and clever. As a result, it cannot be detected using only static malware analysis. As a result, this work presents a hybrid analysis approach, partially tailored for multiple-feature data, for identifying Android malware and classifying malware families to improve Android malware detection and classification. This paper offers a hybrid method that combines static and dynamic malware analysis to give a full view of the threat. Three distinct phases make up the framework proposed in this research. Normalization and feature extraction procedures are used in the first phase of pre-processing. Both static and dynamic features undergo feature selection in the second phase. Two feature selection strategies are proposed to choose the best subset of features to use for both static and dynamic features. The third phase involves applying a newly proposed detection model to classify android apps; this model uses a neural network optimized with an improved version of HHO. Application of binary and multi-class classification is used, with binary classification for benign and malware apps and multi-class classification for detecting malware categories and families. By utilizing the features gleaned from static and dynamic malware analysis, several machine-learning methods are used for malware classification. According to the results of the experiments, the hybrid approach improves the accuracy of detection and classification of Android malware compared to the scenario when considering static and dynamic information separately.

  • Conference Article
  • Cite Count Icon 13
  • 10.1109/wimob50308.2020.9253386
An Intelligent Malware Detection and Classification System Using Apps-to-Images Transformations and Convolutional Neural Networks
  • Oct 12, 2020
  • Farid Nait-Abdesselam + 2 more

With the proliferation of Mobile Internet, handheld devices are facing continuous threats from apps that contain malicious intents. These malicious apps, or malware, have the capability of dynamically changing their intended code as they spread. Moreover, the diversity and volume of their variants severely undermine the effectiveness of traditional defenses, which typically use signature-based techniques, and make them unable to detect the previously unknown malware. However, the variants of malware families share typical behavioral patterns reflecting their origin and purpose. The behavioral patterns, obtained either statically or dynamically, can be exploited to detect and classify unknown malware into their known families using machine learning techniques. In this paper, we propose a new approach for detecting and analyzing a malware. Mainly focused on android apps, our approach adopts the two following steps: (1) performs a transformation of an APK file into a lightweight RGB image using a predefined dictionary and intelligent mapping, and (2) trains a convolutional neural network on the obtained images for the purpose of signature detection and malware family classification. The results obtained using the Androzoo dataset show that our system classifies both legacy and new malware apps with high accuracy, low false-negative rate (FNR), and low false-positive rate (FPR).

  • Conference Article
  • Cite Count Icon 34
  • 10.1109/intelcct.2017.8324010
Dynamic behavior analysis of android applications for malware detection
  • Dec 1, 2017
  • Latika Singh + 1 more

Android is most popular operating system for smartphones and small devices with 86.6% market share (Chau 2016). Its open source nature makes it more prone to attacks creating a need for malware analysis. Main approaches for detecting malware intents of mobile applications are based on either static analysis or dynamic analysis. In static analysis, apps are inspected for suspicious patterns of code to identify malicious segments. However, several obfuscation techniques are available to provide a guard against such analysis. The dynamic analysis on the other hand is a behavior-based detection method that involves investigating the run-time behavior of the suspicious app to uncover malware. The present study extracts the system call behavior of 216 malicious apps and 278 normal apps to construct a feature vector for training a classifier. Seven data classification techniques including decision tree, random forest, gradient boosting trees, k-NN, Artificial Neural Network, Support Vector Machine and deep learning were applied on this dataset. Three feature ranking techniques were usedto select appropriate features from the set of 337 attributes (system calls). These techniques of feature ranking included information gain, Chi-square statistic and correlation analysis by determining weights of the features. After discarding select features with low ranks the performances of the classifiers were measured using accuracy and recall. Experiments show that Support Vector Machines (SVM) after selecting features through correlation analysis outperformed other techniques where an accuracy of 97.16% is achieved with recall 99.54% (for malicious apps). The study also contributes by identifying the set of systems calls that are crucial in identifying malicious intent of android apps.

  • Conference Article
  • Cite Count Icon 239
  • 10.1145/2381934.2381950
SmartDroid
  • Oct 19, 2012
  • Cong Zheng + 6 more

User interface (UI) interactions are essential to Android applications, as many Activities require UI interactions to be triggered. This kind of UI interactions could also help malicious apps to hide their sensitive behaviors (e.g., sending SMS or getting the user's device ID) from being detected by dynamic analysis tools such as TaintDroid, because simply running the app, but without proper UI interactions, will not lead to the exposure of sensitive behaviors. In this paper we focus on the challenging task of triggering a certain behavior through automated UI interactions. In particular, we propose a hybrid static and dynamic analysis method to reveal UI-based trigger conditions in Android applications. Our method first uses static analysis to extract expected activity switch paths by analyzing both Activity and Function Call Graphs, and then uses dynamic analysis to traverse each UI elements and explore the UI interaction paths towards the sensitive APIs. We implement a prototype system SmartDroid and show that it can automatically and efficiently detect the UI-based trigger conditions required to expose the sensitive behavior of several Android malwares, which otherwise cannot be detected with existing techniques such as TaintDroid.

  • Conference Article
  • Cite Count Icon 36
  • 10.1145/2799979.2800004
A robust dynamic analysis system preventing SandBox detection by Android malware
  • Sep 8, 2015
  • Jyoti Gajrani + 5 more

Due to an increase in the number of Android malware applications and their diversity, it has become necessary for the security community to develop automated dynamic analysis systems. Static analysis has its limitations that can be overcome by dynamic analysis. Many tools based on dynamic analysis approach have been developed which employ emulated/virtualized environment for analysis. While it has been an effective technique for analysis, it can be espied and evaded by recent sophisticated malware. Malware families such as Pincer, AnserverBot, BgServ, Wroba have incorporated methods to check the presence of emulated or virtualized environment. Once the presence of the sandbox is detected, they do not execute any malicious behavior. In this paper, a robust emulated environment has been proposed and developed that is resilient against most of the detection techniques. We have compared our malware analysis tool DroidAnalyst against 12 publicly available dynamic analysis services and shown that our service is best when considering resilience against anti-emulation techniques. Incorporation of anti anti-detection techniques in the dynamic analysis that are purely based on emulation hinders the detection and evasion of emulated environment by malware.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant