Android Malware Family Classification Based on Sensitive Opcode Sequence

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Android malware family classification is an advanced task in Android malware analysis, detection and forensics. Existing methods and models have achieved a certain success for Android malware detection, but the accuracy and the efficiency are still not up to the expectation, especially in the context of multiple class classification with imbalanced training data. To address those challenges, we propose an Android malware family classification model by analyzing the code's specific semantic information based on sensitive opcode sequence. In this work, we construct a sensitive semantic feature-sensitive opcode sequence using opcodes, sensitive APIs, STRs and actions, and propose to analyze the code's specific semantic information, generate a semantic related vector for Android malware family classification based on this feature. Besides, aiming at the families with minority, we adopt an oversampling technique based on the sensitive opcode sequence. Finally, we evaluate our method on Drebin dataset, and select the top 40 malware families for experiments. The experimental results show that the Total Accuracy and Average AUC (Area Under Curve, AUC) reach 99.50% and 98.86% with 45. 17s per Android malware, and even if the number of malware families increases, these results remain good.

Similar Papers
  • Book Chapter
  • Cite Count Icon 17
  • 10.1007/978-981-13-9181-1_53
StackDroid: Evaluation of a Multi-level Approach for Detecting the Malware on Android Using Stacked Generalization
  • Jan 1, 2019
  • Sheikh Shah Mohammad Motiur Rahman + 1 more

Attackers or cyber criminals are getting encouraged to develop android malware because of the rapidly growing rate of android users. To detect android malware, researchers and security specialist have been started to contribute on android malware analysis and detection related tasks using machine learning algorithms. In this paper, Stacked Generalization has been used to minimize the error rate and a multi-level architecture based approach named StackDroid has been presented and evaluated. In this experiment, Extremely Randomized Tree (ET), Random Forest (RF), Multi-Layer Perceptron (MLP) and Stochastic Gradient Descent (SGD) classifiers have been used as base classifiers in level 1 and Extreme Gradient Boosting (EGB) has been used as final predictor in level 2. It’s been found that StackDroid provides 99% of Area Under Curve (AUC), 1.67% of False Positive Rate (FPR) and 97% detection accuracy on DREBIN dataset which provides a strong basement to the development of android malware scanner.

  • Conference Article
  • Cite Count Icon 13
  • 10.1145/3374664.3379530
GRAMAC
  • Mar 16, 2020
  • Devyani Vij + 3 more

Android malware analysis has been an active area of research as the number and types of Android malwares have increased dramatically. Most of the previous works have used permission based model, behavioral analysis, and code analysis to identify the family of a malware. Code Analysis are weak against obfuscated approach, it does not include real time execution of the application. Behavioral analysis captures the runtime behavior but is weak when it comes to obfuscated applications. Permission based model only uses manifest files for analysing malwares. In this paper, we propose a novel graph signature based malware classification mechanism . The proposed graph signature uses sensitive API calls to capture the flow of control which helps to find a caller-callee relationship between the sensitive APIs and the nodes incident on them. A dataset of graph signatures of widely known malware families are then created. A new application's graph signature is compared with graph signatures in the dataset and the application is classified into the respective malware family or declared as goodware/unknown. Experiments with 15 malware families from the AMD dataset and a total of 400 applications gave an average accuracy of 0.97 with an error rate of 0.03.

  • Research Article
  • Cite Count Icon 68
  • 10.1016/j.procs.2021.03.118
Towards Explainable CNNs for Android Malware Detection
  • Jan 1, 2021
  • Procedia Computer Science
  • Martin Kinkead + 3 more

Towards Explainable CNNs for Android Malware Detection

  • Dissertation
  • Cite Count Icon 3
  • 10.32657/10356/72122
A semantic-based analysis of Android malware for detection, generation, and trend analysis
  • Jan 1, 2017
  • Guozhu Meng

Android has grown to be the most popular mobile operating system since its release in 2008.Due to its openness and ease of use, it attracts thousands of vendors and developers working on Android application development.Millions of apps provide a variety of functionalities to Android users, such as online shopping, instant messaging, gaming and map service.However, Android becomes a hot attack target of cybercriminals due to its prevalence.According to the security report of Symantec in 2016, the number of Android malware has reached 13 million in 2015.Android malware is uploaded into either Google official market or unofficial markets everyday by cybercriminals which put users under a high risk.The malware may steal users' sensitive information, elevate the privilege, remote control devices, and encrypt users' files for ransom.It is non-trivial to understand the risks and develop effective mitigation against them.Malware is the critical and non-trivial issue in Android security.In order to prevent malware from attacking the users, we need a better understanding of Android malware and its behaviors, which can facilitate the extraction of representative features from malware, and thereby enhance malware detection.The malware and anti-malware tools are keeping evolving during the process of competition.Therefore, it is valuable to learn the characteristics of evolving malware, and weakness of existing anti-malware tools.Moreover, a sustaining malware analysis and security assessment is lacking for the Android world.In order to address these problems, we propose a semantic based malware analysis on these topics with the following achievements in this thesis:1. We propose a precise semantic model of Android malware based on Deterministic Symbolic Automaton (DSA) for the purpose of malware comprehension, detection and classification.Based on DSA, we develop an automatic analysis framework, named SMART, which learns DSA by detecting and summarizing semantic clones from malware families, and then extracts semantic features from the learned DSA to classify malware according to the attack patterns.We conduct the experiments in both malware benchmark and 223,170 real-world apps.The results show that SMART builds meaningful semantic models and outperforms both state-of-the-art approaches and anti-virus tools in malware detection.SMART identifies 4583 new malware in real-world apps that are missed by most anti-virus tools.The classification step further identifies new malware variants and unknown families.iv 2. We first propose a meta model for Android malware to capture the common attack features and evasion features in the malware.Based on this model, we develop a framework, MYSTIQUE, to automatically generate malware covering four attack features and two evasion features, by adopting the software product line engineering approach.With the help of MYSTIQUE, we conduct experiments to 1) understand Android malware and the associated attack features as well as evasion techniques; 2) evaluate and compare the 57 off-the-shelf anti-malware tools, 9 academic solutions and 4 Android market vetting processes in terms of accuracy in detecting attack features and capability in addressing evasion.Last but not least, we provide a benchmark of Android malware with proper labeling of contained attack and evasion features.Moreover, we extend this work to MYSTIQUE-S to explore the capabilities of anti-malware tools detecting malware with dynamic code loading.MYSTIQUE-S automatically selects attack features under various user scenarios and delivers the corresponding malicious payloads at runtime.Relying on dynamic code binding (via service) and loading (via reflection) techniques, MYSTIQUE-S enables the dynamic execution of payloads on user devices at runtime.Experimental results on real-world devices show that existing Anti-Malware Tools (AMTs) are incapable of detecting most of our generated malware.Last, we propose some enhancements for existing anti-malware tools.3. We propose a systematic approach to study Android malware, unveil security issues, obtain insightful conclusions and highlights, and predict the future trend for research.We have collected 4,267,178 Android apps from a variety of Android marketplaces, where 1,004,550 malware variants are identified and analyzed.Different from previous works, this work focuses on the differences and evolution of apps' characteristics, and identifies multiple security-related issues concerned by both academia and industry.In order to provide a comprehensive view for these issues, we propose four analyses on individual app, malware family, malware author, and market, to conduct our study and guide the analysis.Furthermore, we propose six dimensions to cluster apps for different analysis tasks to achieve efficiency and accuracy in the large-scale analysis.Some of the key findings reflect the characteristics of attacks, and the weaknesses in protection, which can benefit all stakeholders.x

  • Conference Article
  • Cite Count Icon 47
  • 10.1109/malware.2017.8323954
Android malware family classification based on resource consumption over time
  • Oct 1, 2017
  • Luca Massarelli + 5 more

The vast majority of today's mobile malware targets Android devices. This has pushed the research effort in Android malware analysis in the last years. An important task of malware analysis is the classification of malware samples into known families. Static malware analysis is known to fall short against techniques that change static characteristics of the malware (e.g. code obfuscation), while dynamic analysis has proven effective against such techniques. To the best of our knowledge, the most notable work on Android malware family classification purely based on dynamic analysis is DroidScribe. With respect to DroidScribe, our approach is easier to reproduce. Our methodology only employs publicly available tools, does not require any modification to the emulated environment or Android OS, and can collect data from physical devices. The latter is a key factor, since modern mobile malware can detect the emulated environment and hide their malicious behaviour. Our approach relies on resource consumption metrics available from the proc file system. Features are extracted through detrended fluctuation analysis and correlation. Finally, a SVM is employed to classify malware into families. We provide an experimental evaluation on malware samples from the Drebin dataset, where we obtain a classification accuracy of 82%c, proving that our methodology achieves an accuracy comparable to that of DroidScribe. Furthermore, we make the software we developed publicly available, to ease the reproducibility of our results.

  • Conference Article
  • Cite Count Icon 13
  • 10.1109/pst.2017.00036
Automated Static Analysis and Classification of Android Malware using Permission and API Calls Models
  • Jun 1, 2017
  • Anastasia Skovoroda + 1 more

In this paper we propose a heuristic approach to static analysis of Android applications based on matching suspicious applications with the predefined malware models. Static models are built from Android capabilities and Android Framework API call chains used by the application. All of the analysis steps and model construction are fully automated. Therefore, the method can be easily deployed as one of the automated checks provided by mobile application marketplaces or other interested organizations. Using the proposed method we analyzed the Drebin and ISCX malware collections in order to find possible relationships and dependencies between samples in collections, and a large fraction of Google Play apps collected between 2013 and 2016 representing benign data. Analysis results show that a combination of relatively simple static features represented by permissions and API call chains is enough to perform binary classification between malware and benign apps, and even find the corresponding malware family, with an appropriate false positive rate of about 3% (less than 1% in case of filtering adware). Malware collections exploration results show that Android malware rarely uses obfuscation or encryption techniques to make static analysis more difficult, which is quite the opposite of what we see in the case of the 'Wintel' endpoint platform family. We also provide the experiment-based comparison with the previously proposed state-of-the-art Android malware detection method adagio.

  • Research Article
  • Cite Count Icon 76
  • 10.1155/2018/4157156
TinyDroid: A Lightweight and Efficient Model for Android Malware Detection and Classification
  • Oct 17, 2018
  • Mobile Information Systems
  • Tieming Chen + 4 more

With the popularity of Android applications, Android malware has an exponential growth trend. In order to detect Android malware effectively, this paper proposes a novel lightweight static detection model, TinyDroid, using instruction simplification and machine learning technique. First, a symbol-based simplification method is proposed to abstract the opcode sequence decompiled from Android Dalvik Executable files. Then, N-gram is employed to extract features from the simplified opcode sequence, and a classifier is trained for the malware detection and classification tasks. To improve the efficiency and scalability of the proposed detection model, a compression procedure is also used to reduce features and select exemplars for the malware sample dataset. TinyDroid is compared against the state-of-the-art antivirus tools in real world using Drebin dataset. The experimental results show that TinyDroid can get a higher accuracy rate and lower false alarm rate with satisfied efficiency.

  • Book Chapter
  • Cite Count Icon 8
  • 10.1007/978-3-030-63095-9_14
AOMDroid: Detecting Obfuscation Variants of Android Malware Using Transfer Learning
  • Jan 1, 2020
  • Lecture notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
  • Yu Jiang + 4 more

Android with its large market attracts malware developers. Malware developers employ obfuscation techniques to bypass malware detection mechanisms. Existing systems cannot effectively detect obfuscated Android malware. In this paper, We propose a novel approach to identify obfuscated Android malware. Our proposed approach is based on the intuition that opcode sequences are more resilient to the obfuscation techniques. We first propose an effective approach based on TFIDF algorithm to identify distinctive opcode sequences. Then we represent the opcode sequences as images and reduce the problem of identifying an obfuscated malware to the problem of transforming two images to one another, i.e. unobfuscated malware representation to the obfuscated one. In order to achieve the above, we resort to the transfer learning. We implemented a prototype dubbed AOMDroid based on the proposed approach and extensively evaluated its performance of accuracy and detection time. AOMDroid outperforms four related works that we compared with, and has an accuracy rate of 92.26% in detecting Android obfuscated malware. In addition, AOMDroid supports the detection of 21 Android malware family types. Its malware family detecion accuracy rate is 87.39%. The average time spent by AOMDroid to detect a single Android application is 0.963 s.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 52
  • 10.1109/access.2019.2946392
A3CM: Automatic Capability Annotation for Android Malware
  • Jan 1, 2019
  • IEEE Access
  • Junyang Qiu + 6 more

Android malware poses serious security and privacy threats to the mobile users. Traditional malware detection and family classification technologies are becoming less effective due to the rapid evolution of the malware landscape, with the emerging of so-called zero-day-family malware families. To address this issue, our paper presents a novel research problem on automatically identifying the security/privacy-related capabilities of any detected malware, which we refer to as Malware Capability Annotation (MCA). Motivated by the observation that known and zero-day-family malware families share the security/privacy-related capabilities, MCA opens a new alternative way to effectively analyze zero-day-family malware (the malware that do not belong to any existing families) through exploring the related information and knowledge from known malware families. To address the MCA problem, we design a new MCA hunger solution, Automatic Capability Annotation for Android Malware (A3CM). A3CM works in the following four steps: 1) A3CM automatically extracts a set of semantic features such as permissions, API calls, network addresses from raw binary APKs to characterize malware samples; 2) A3CM applies a statistical embedding method to map the features into a joint feature space, so that malware samples can be represented as numerical vectors; 3) A3CM infers the malicious capabilities by using the multi-label classification model; 4) The trained multi-label model is used to annotate the malicious capabilities of the candidate malware samples. To facilitate the new research of MCA, we create a new ground truth dataset that consists of 6,899 annotated Android malware samples from 72 families. We carry out a large number of experiments based on the four representative security/privacy-related capabilities to evaluate the effectiveness of A3CM. Our results show that A3CM can achieve promising accuracy of 1.00, 0.98 and 0.63 in inferring multiple capabilities of known Android malware, small size-families' malware and zero-day-families' Android malware, respectively.

  • Book Chapter
  • 10.1007/978-3-030-74664-3_4
Robust Android Malicious Community Fingerprinting
  • Jan 1, 2021
  • Elmouatez Billah Karbab + 3 more

Security practitioners can combat large-scale Android malware by decreasing the analysis window size of newly detected malware. The window starts from the first detection until signature generation by anti-malware vendors. The larger the window is, the more time the malicious apps are given to spread over the users’ devices. Current state-of-the-art techniques have a large analysis window due to the significant number of Android malware appearing daily. Besides, these techniques use manual analysis in some cases to investigate malware. Therefore, decreasing the need for manual detection could significantly reduce the analysis window. To address the aforementioned issue, we elaborate systematic techniques and tools for the detection of both known family apps and new malware family apps (i.e., variants of existing families or unseen malware). To do so, we rely on the assumption that any pair of Android apps, with distinct authors and certificates, are most likely to be malicious if they are highly similar. Because the adversary usually repackages multiple app packages with the same malicious payload to hide it from anti-malware and vetting systems. Consequently, it is difficult to detect such malicious payloads from benign functionalities of a given Android package. Accordingly, a pair of Android apps should not be very similar in their components, excluding popular libraries. This observation, as mentioned earlier, could be used to design and develop a security framework to detect Android malware apps.In this chapter, we propose a novel Android app fingerprinting technique, APK-DNA, inspired by fuzzy hashing. We specifically target fingerprinting Android malicious apps. Computing the APK-DNA of a suspicious app requires a low computation time. Afterward, we leverage the previously mentioned assumption (i.e., very similar apps might be malware from the same malware family) to propose a cyber-security framework, namely Cypider (Cyber-Spider for Android malware detection), to detect and cluster Android malware without prior- knowledge of Android malware apps. Cypider consists of a novel combination of a set of techniques to address the problem of Android malware, clustering, and fingerprinting. First, Cypider can detect repackaged malware (malware families), which constitute the vast majority of Android malware apps (Zhou and Jiang (Dissecting android malware: Characterization and evolution, in IEEE Symposium on Security and Privacy, SP 2012, 21–23 May 2012, San Francisco (2012), pp. 95–109)). Second, it can detect new malware apps, and more importantly, Cypider performs the detection automatically and in an unsupervised way (i.e., no prior-knowledge about the apps). The fundamental idea of Cypider relies on building a similarity network between the targeted apps static content in terms of fuzzy fingerprints. Actually, Cypider extracts, from this similarity network, sub-graphs with high connectivity, called communities, which are most likely to be malicious communities.

  • Research Article
  • Cite Count Icon 21
  • 10.1109/tc.2022.3143439
Lightweight, Effective Detection and Characterization of Mobile Malware Families
  • Nov 1, 2022
  • IEEE Transactions on Computers
  • Karim O Elish + 2 more

Android malware is an ongoing threat to billions of smart devices’ security, ranging from mobile phones to car infotainment systems. Despite numerous approaches and previous studies to develop solutions for detecting and preventing Android malware, the rapid continuous development of new malware variants requires a careful reconsideration and the development of effective methods to identify malware families given a meager number of malware instances. In this paper, we present DroidMalVet, a novel Android malware family classification and detection approach that does not require to perform complex program analyses or utilize large feature sets. DroidMalVet is the first to use a promising, diverse, and small set of software metrics as features in a supervised learning platform to classify and detect various Android malware families. Our extensive empirical evaluations on two large public malware datasets show that DroidMalVet accurately detects both small and large malware families with F-Score accuracy of 94.4% and 96%, and AUC equal to 99.5% and 99.7% on the malware families in Drebin and AMD datasets, respectively. Moreover, our results demonstrate the superior performance of DroidMalVet in detecting small families (i.e., families with few samples). DroidMalVet complements existing approaches and presents an early warning tool for detecting known and emerging malware families.

  • Conference Article
  • Cite Count Icon 659
  • 10.1109/asiajcis.2012.18
DroidMat: Android Malware Detection through Manifest and API Calls Tracing
  • Aug 1, 2012
  • Dong-Jie Wu + 4 more

Recently, the threat of Android malware is spreading rapidly, especially those repackaged Android malware. Although understanding Android malware using dynamic analysis can provide a comprehensive view, it is still subjected to high cost in environment deployment and manual efforts in investigation. In this study, we propose a static feature-based mechanism to provide a static analyst paradigm for detecting the Android malware. The mechanism considers the static information including permissions, deployment of components, Intent messages passing and API calls for characterizing the Android applications behavior. In order to recognize different intentions of Android malware, different kinds of clustering algorithms can be applied to enhance the malware modeling capability. Besides, we leverage the proposed mechanism and develop a system, called Droid Mat. First, the Droid Mat extracts the information (e.g., requested permissions, Intent messages passing, etc) from each application's manifest file, and regards components (Activity, Service, Receiver) as entry points drilling down for tracing API Calls related to permissions. Next, it applies K-means algorithm that enhances the malware modeling capability. The number of clusters are decided by Singular Value Decomposition (SVD) method on the low rank approximation. Finally, it uses kNN algorithm to classify the application as benign or malicious. The experiment result shows that the recall rate of our approach is better than one of well-known tool, Androguard, published in Black hat 2011, which focuses on Android malware analysis. In addition, Droid Mat is efficient since it takes only half of time than Androguard to predict 1738 apps as benign apps or Android malware.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 13
  • 10.3390/app12094664
Empirical Analysis of Forest Penalizing Attribute and Its Enhanced Variations for Android Malware Detection
  • May 6, 2022
  • Applied Sciences
  • Abimbola G Akintola + 9 more

As a result of the rapid advancement of mobile and internet technology, a plethora of new mobile security risks has recently emerged. Many techniques have been developed to address the risks associated with Android malware. The most extensively used method for identifying Android malware is signature-based detection. The drawback of this method, however, is that it is unable to detect unknown malware. As a consequence of this problem, machine learning (ML) methods for detecting and classifying malware applications were developed. The goal of conventional ML approaches is to improve classification accuracy. However, owing to imbalanced real-world datasets, the traditional classification algorithms perform poorly in detecting malicious apps. As a result, in this study, we developed a meta-learning approach based on the forest penalizing attribute (FPA) classification algorithm for detecting malware applications. In other words, with this research, we investigated how to improve Android malware detection by applying empirical analysis of FPA and its enhanced variants (Cas_FPA and RoF_FPA). The proposed FPA and its enhanced variants were tested using the Malgenome and Drebin Android malware datasets, which contain features gathered from both static and dynamic Android malware analysis. Furthermore, the findings obtained using the proposed technique were compared with baseline classifiers and existing malware detection methods to validate their effectiveness in detecting malware application families. Based on the findings, FPA outperforms the baseline classifiers and existing ML-based Android malware detection models in dealing with the unbalanced family categorization of Android malware apps, with an accuracy of 98.94% and an area under curve (AUC) value of 0.999. Hence, further development and deployment of FPA-based meta-learners for Android malware detection and other cybersecurity threats is recommended.

  • Research Article
  • Cite Count Icon 64
  • 10.1109/tifs.2023.3287395
Comprehensive Android Malware Detection Based on Federated Learning Architecture
  • Jan 1, 2023
  • IEEE Transactions on Information Forensics and Security
  • Wenbo Fang + 7 more

Android malware and its variants are a major challenge for mobile platforms. However, there are two main problems in the existing detection methods: <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">a</i> ) The detection method lacks the evolution ability for Android malware, which leads to the low detection rate of the detection model for malware and its variants. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">b</i> ) Traditional detection methods require centralized data for model training, however, the aggregation of training samples is limited due to the infectivity of malware and growing data privacy concerns, centralized detection methods are difficult to be applied in actual detection scenarios. In this paper, we propose FEDriod, a comprehensive Android malware detection method based on federated learning architecture that protects against growing Android malware or emerging Android malware variants. Specifically, we employ genetic evolution strategy to simulate the evolution of Android malware and develop potential malware variants from typical Android malware. Then, we customize the Android malware detection model based on residual neural network to achieve high detection accuracy. Finally, to achieve the protection sensitive data, we develope a federated learning framework to allows multiple Android malware detection agencies to jointly build a comprehensive Android malware detection model. We comprehensively evaluate the performance of FEDriod on the CIC, Drebin, and Contagio authoritative datasets. Experimental results show that our local model outperforms all baseline classifiers. In the federal scenario, our proposed method is superior to the state-of-the-art detection methods, especially in the cross-dataset evaluation, the F1 of FEDriod is 98.53%. More important, we performed genetic evolution experiments on the Drebin dataset, and the results showed that our proposed method has the ability to detect Android malware variants.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 14
  • 10.1371/journal.pone.0270647
Android malware analysis in a nutshell
  • Jul 5, 2022
  • PLoS ONE
  • Iman Almomani + 2 more

This paper offers a comprehensive analysis model for android malware. The model presents the essential factors affecting the analysis results of android malware that are vision-based. Current android malware analysis and solutions might consider one or some of these factors while building their malware predictive systems. However, this paper comprehensively highlights these factors and their impacts through a deep empirical study. The study comprises 22 CNN (Convolutional Neural Network) algorithms, 21 of them are well-known, and one proposed algorithm. Additionally, several types of files are considered before converting them to images, and two benchmark android malware datasets are utilized. Finally, comprehensive evaluation metrics are measured to assess the produced predictive models from the security and complexity perspectives. Consequently, guiding researchers and developers to plan and build efficient malware analysis systems that meet their requirements and resources. The results reveal that some factors might significantly impact the performance of the malware analysis solution. For example, from a security perspective, the accuracy, F1-score, precision, and recall are improved by 131.29%, 236.44%, 192%, and 131.29%, respectively, when changing one factor and fixing all other factors under study. Similar results are observed in the case of complexity assessment, including testing time, CPU usage, storage size, and pre-processing speed, proving the importance of the proposed android malware analysis model.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant