KronoDroid: Time-based Hybrid-featured Dataset for Effective Android Malware Detection and Characterization
KronoDroid: Time-based Hybrid-featured Dataset for Effective Android Malware Detection and Characterization
- Research Article
29
- 10.1016/j.cose.2022.102757
- May 19, 2022
- Computers & Security
Concept drift and cross-device behavior: Challenges and implications for effective android malware detection
- Research Article
4
- 10.1016/j.mlwa.2022.100357
- Jun 18, 2022
- Machine Learning with Applications
Most of the proposed solutions using dynamic features for Android malware detection collect and test their systems using a single and particular data collection device, either a real device or an emulator. The results obtained using these particular devices are then generalized to any Android platform. This extensive generalization is based on the assumption of consistent behavior of apps across devices. This study performs an extensive benchmarking of this assumption for system calls, executing Android malware and benign samples under the same conditions in 9 different collection devices, including real and virtual devices. The results indicate the existence of significant differences between real devices and emulators in system calls usage and, consequently, in the collected behavioral profiles obtained from running the same set of applications on different devices. Furthermore, the impact of these differences on machine learning-based malware detection models is evaluated. In this regard, a significant degenerative effect on the detection performance of the model is produced when data collected on different devices are used in the training and testing sets. Therefore, the empirical findings do not support the assumption of cross-device consistent behavior of Android apps when system calls are used as descriptive features.
- Research Article
45
- 10.1016/j.eswa.2022.117200
- Apr 21, 2022
- Expert Systems with Applications
Android malware concept drift using system calls: Detection, characterization and challenges
- Conference Article
146
- 10.1145/2804345.2804349
- Aug 31, 2015
The increasing diffusion of smart devices, along with the dynamism of the mobile applications ecosystem, are boosting the production of malware for the Android platform. So far, many different methods have been developed for detecting Android malware, based on either static or dynamic analysis. The main limitations of existing methods include: low accuracy, proneness to evasion techniques, and weak validation, often limited to emulators or modified kernels. We propose an Android malware detection method, based on sequences of system calls, that overcomes these limitations. The assumption is that malicious behaviors (e.g., sending high premium rate SMS, cyphering data for ransom, botnet capabilities, and so on) are implemented by specific system calls sequences: yet, no apriori knowledge is available about which sequences are associated with which malicious behaviors, in particular in the mobile applications ecosystem where new malware and non-malware applications continuously arise. Hence, we use Machine Learning to automatically learn these associations (a sort of fingerprint of the malware); then we exploit them to actually detect malware. Experimentation on 20000 execution traces of 2000 applications (1000 of them being malware belonging to different malware families), performed on a real device, shows promising results: we obtain a detection accuracy of 97%. Moreover, we show that the proposed method can cope with the dynamism of the mobile apps ecosystem, since it can detect unknown malware.
- Book Chapter
- 10.1007/978-3-030-74664-3_4
- Jan 1, 2021
Security practitioners can combat large-scale Android malware by decreasing the analysis window size of newly detected malware. The window starts from the first detection until signature generation by anti-malware vendors. The larger the window is, the more time the malicious apps are given to spread over the users’ devices. Current state-of-the-art techniques have a large analysis window due to the significant number of Android malware appearing daily. Besides, these techniques use manual analysis in some cases to investigate malware. Therefore, decreasing the need for manual detection could significantly reduce the analysis window. To address the aforementioned issue, we elaborate systematic techniques and tools for the detection of both known family apps and new malware family apps (i.e., variants of existing families or unseen malware). To do so, we rely on the assumption that any pair of Android apps, with distinct authors and certificates, are most likely to be malicious if they are highly similar. Because the adversary usually repackages multiple app packages with the same malicious payload to hide it from anti-malware and vetting systems. Consequently, it is difficult to detect such malicious payloads from benign functionalities of a given Android package. Accordingly, a pair of Android apps should not be very similar in their components, excluding popular libraries. This observation, as mentioned earlier, could be used to design and develop a security framework to detect Android malware apps.In this chapter, we propose a novel Android app fingerprinting technique, APK-DNA, inspired by fuzzy hashing. We specifically target fingerprinting Android malicious apps. Computing the APK-DNA of a suspicious app requires a low computation time. Afterward, we leverage the previously mentioned assumption (i.e., very similar apps might be malware from the same malware family) to propose a cyber-security framework, namely Cypider (Cyber-Spider for Android malware detection), to detect and cluster Android malware without prior- knowledge of Android malware apps. Cypider consists of a novel combination of a set of techniques to address the problem of Android malware, clustering, and fingerprinting. First, Cypider can detect repackaged malware (malware families), which constitute the vast majority of Android malware apps (Zhou and Jiang (Dissecting android malware: Characterization and evolution, in IEEE Symposium on Security and Privacy, SP 2012, 21–23 May 2012, San Francisco (2012), pp. 95–109)). Second, it can detect new malware apps, and more importantly, Cypider performs the detection automatically and in an unsupervised way (i.e., no prior-knowledge about the apps). The fundamental idea of Cypider relies on building a similarity network between the targeted apps static content in terms of fuzzy fingerprints. Actually, Cypider extracts, from this similarity network, sub-graphs with high connectivity, called communities, which are most likely to be malicious communities.
- Research Article
13
- 10.1109/access.2024.3357944
- Jan 1, 2024
- IEEE Access
Android malware recognition is the procedure of mitigating and identifying malicious software (malware) planned to target Android operating systems (OS) that are extremely utilized in smartphones and tablets. As the Android ecosystem endures to produce, therefore is the risk of malware attacks on these devices. Identifying Android malware is vital for keeping user data, privacy, and device integrity. Android malware detection utilizing deep learning (DL) signifies a cutting-edge system for the maintenance of mobile devices. DL approaches namely recurrent neural network (RNN) and convolutional neural network (CNN) are best in automatically removing intricate designs and behaviors in Android app data. By leveraging features such as application programming interface (API) call sequences, code patterns, and permissions, these approaches are efficiently differentiated between benign and malicious apps, even in the face of previous unseen attacks. This study presents an Intelligent Pattern Recognition using an Equilibrium Optimizer with Deep Learning (IPR-EODL) Approach for Android Malware Recognition. The purpose of the IPR-EODL approach is to properly identify and categorize the Android malware in such a way that security can be achieved. In the IPR-EODL technique, the data pre-processing step was applied to convert input data into a compatible setup. In addition, the IPR-EODL technique applies channel attention long short-term memory (CA-LSTM) methodology for the recognition of Android malware. To enhance the solution of the CA-LSTM algorithm, the IPR-EODL system employs the Equilibrium optimization (EO) algorithm for the hyperparameter tuning method. The experimentation evaluation of the IPR-EODL model can be verified on a benchmark Android malware database. The extensive results highlight the significant result of the IPR-EODL approach to the Android malware detection process.
- Research Article
8
- 10.1007/s11235-022-00983-2
- Feb 20, 2023
- Telecommunication Systems
In recent years, the rapid increase in the number and type of Android malware has brought great challenges and pressure to malware detection systems. As a widely used method in android malware detection, static detecting has been a hot topic in academia and industry. However, in order to improve the accuracy of detection, the existing static detecting methods sacrifice the excessively high analysis complexity and time cost. Moreover, the correlation between static features leads to redundancy of a large amount of data. Therefore, this paper proposes a static detecting method of Android malware based on sensitive pattern. It uses an improved FP-growth algorithm to mine frequent combinations of sensitive permissions and API calls in malicious apps and benign apps, which avoids the generation of redundant information. In addition, this paper adopts multi-layered gradient boosting decision trees algorithm to train the detection model. And a dual similarity combination method is proposed to measure the similarity between different sensitive patterns. The experimental results show that our proposed detection method has high accuracy and great generalization ability.
- Conference Article
2
- 10.1109/saner56733.2023.00065
- Mar 1, 2023
The rapid growth of Android malware calls for anti-malware systems to detect malware automatically. Detecting malware effectively is a non-trivial problem due to the high overlap in behaviors between malware and benign apps. Most existing automated Android malware detection methods use statistic features extracted from apps or graphs generated from method calls to identify malware. However, the methods that only use statistic features lead to false positives due to ignoring program semantics. Existing graph-based approaches suffer scalability problems due to the heavy-weight program analysis and time-consuming graph matching. In addition, graph-based approaches could be evaded by modifying dependencies among method calls. As a result, crafted malicious apps resemble the benign ones.In this paper, we propose a novel deep learning-based detection system, named RGDroid, which is capable of detecting malware under graph structural attacks. It combines API information extracted from Android document and learns behavior features from function call graph by graph neural network. Specifically, to defend against graph adversarial attacks, RGDroid reduces the connectivity of different functional parts to mitigate the effect of structural modifications on the final graph embedding. To comprehensively evaluate the robustness of RGDroid, we implement four influential graph adversarial attacks to simulate current capabilities and knowledge of Android malware attackers. The attack success rate (ASR) of two state-of-the-art detection systems (i.e., MaMaDroid, MalScan) is above 70.0% while the ASR of RGDroid under the four graph attacks is below 6.1%.
- Research Article
287
- 10.1007/s12652-018-0803-6
- Apr 28, 2018
- Journal of Ambient Intelligence and Humanized Computing
Android security incidents occurred frequently in recent years. To improve the accuracy and efficiency of large-scale Android malware detection, in this work, we propose a hybrid model based on deep autoencoder (DAE) and convolutional neural network (CNN). First, to improve the accuracy of malware detection, we reconstruct the high-dimensional features of Android applications (apps) and employ multiple CNN to detect Android malware. In the serial convolutional neural network architecture (CNN-S), we use Relu, a non-linear function, as the activation function to increase sparseness and “dropout” to prevent over-fitting. The convolutional layer and pooling layer are combined with the full-connection layer to enhance feature extraction capability. Under these conditions, CNN-S shows powerful ability in feature extraction and malware detection. Second, to reduce the training time, we use deep autoencoder as a pre-training method of CNN. With the combination, deep autoencoder and CNN model (DAE-CNN) can learn more flexible patterns in a short time. We conduct experiments on 10,000 benign apps and 13,000 malicious apps. CNN-S demonstrates a significant improvement compared with traditional machine learning methods in Android malware detection. In details, compared with SVM, the accuracy with the CNN-S model is improved by 5%, while the training time using DAE-CNN model is reduced by 83% compared with CNN-S model.
- Research Article
6
- 10.3390/informatics10030067
- Aug 18, 2023
- Informatics
There are a variety of reasons why smartphones have grown so pervasive in our daily lives. While their benefits are undeniable, Android users must be vigilant against malicious apps. The goal of this study was to develop a broad framework for detecting Android malware using multiple deep learning classifiers; this framework was given the name DroidMDetection. To provide precise, dynamic, Android malware detection and clustering of different families of malware, the framework makes use of unique methodologies built based on deep learning and natural language processing (NLP) techniques. When compared to other similar works, DroidMDetection (1) uses API calls and intents in addition to the common permissions to accomplish broad malware analysis, (2) uses digests of features in which a deep auto-encoder generates to cluster the detected malware samples into malware family groups, and (3) benefits from both methods of feature extraction and selection. Numerous reference datasets were used to conduct in-depth analyses of the framework. DroidMDetection’s detection rate was high, and the created clusters were relatively consistent, no matter the evaluation parameters. DroidMDetection surpasses state-of-the-art solutions MaMaDroid, DroidMalwareDetector, MalDozer, and DroidAPIMiner across all metrics we used to measure their effectiveness.
- Conference Article
23
- 10.1109/bigdata47090.2019.9005669
- Dec 1, 2019
Android malware (malicious apps) families share common attributes and behavior through sharing core malicious code. However, as the number of new malware increases, the task of identifying the correct family becomes more challenging. Two prominent approaches tackle this problem, either using dynamic analysis that captures the runtime behavior of the malware or using static analysis methods that can reveal malicious behavior by analyzing the underlying logic and code patterns. A third emerging way is to use the various sources of identification features to analyze the architectural and external attributes of a malicious app. For example, two malicious apps can have different behavioral patterns but share common attributes. We hypothesize that this malware can belong to the same family but attempt to mislead dynamic and code-level static analysis tools by randomizing their behavior. In this work, we utilize a promising set of Android-oriented code metrics that guide a supervised classification learning process for identifying malware families in Android. Our empirical results on 2,869 malware apps, across 35 different malware families, show that these metrics are very effective to identify malware families. In particular, we achieve low false positive rate (1.2%) and AUC score of 0.984 for family identification by using Random Forest (RF) classifier.
- Conference Article
659
- 10.1109/asiajcis.2012.18
- Aug 1, 2012
Recently, the threat of Android malware is spreading rapidly, especially those repackaged Android malware. Although understanding Android malware using dynamic analysis can provide a comprehensive view, it is still subjected to high cost in environment deployment and manual efforts in investigation. In this study, we propose a static feature-based mechanism to provide a static analyst paradigm for detecting the Android malware. The mechanism considers the static information including permissions, deployment of components, Intent messages passing and API calls for characterizing the Android applications behavior. In order to recognize different intentions of Android malware, different kinds of clustering algorithms can be applied to enhance the malware modeling capability. Besides, we leverage the proposed mechanism and develop a system, called Droid Mat. First, the Droid Mat extracts the information (e.g., requested permissions, Intent messages passing, etc) from each application's manifest file, and regards components (Activity, Service, Receiver) as entry points drilling down for tracing API Calls related to permissions. Next, it applies K-means algorithm that enhances the malware modeling capability. The number of clusters are decided by Singular Value Decomposition (SVD) method on the low rank approximation. Finally, it uses kNN algorithm to classify the application as benign or malicious. The experiment result shows that the recall rate of our approach is better than one of well-known tool, Androguard, published in Black hat 2011, which focuses on Android malware analysis. In addition, Droid Mat is efficient since it takes only half of time than Androguard to predict 1738 apps as benign apps or Android malware.
- Research Article
46
- 10.1007/s11704-017-6493-y
- Jun 30, 2018
- Frontiers of Computer Science
The domination of the Android operating system in the market share of smart terminals has engendered increasing threats of malicious applications (apps). Research on Android malware detection has received considerable attention in academia and the industry. In particular, studies on malware families have been beneficial to malware detection and behavior analysis. However, identifying the characteristics of malware families and the features that can describe a particular family have been less frequently discussed in existing work. In this paper, we are motivated to explore the key features that can classify and describe the behaviors of Android malware families to enable fingerprinting the malware families with these features. We present a framework for signature-based key feature construction. In addition, we propose a frequency-based feature elimination algorithm to select the key features. Finally, we construct the fingerprints of ten malware families, including twenty key features in three categories. Results of extensive experiments using Support Vector Machine demonstrate that the malware family classification achieves an accuracy of 92% to 99%. The typical behaviors of malware families are analyzed based on the selected key features. The results demonstrate the feasibility and effectiveness of the presented algorithm and fingerprinting method.
- Research Article
28
- 10.1109/access.2021.3139334
- Jan 1, 2022
- IEEE Access
It is important to effectively detect, mitigate, and defend against Android malware attacks, because Android malware has long represented a major threat to Android app security. Characterizing and classifying similar malicious apps into groups plays a particularly crucial role in building a secure Android app ecosystem. The classification of malware families can efficiently enhance the malware detection process and systematically elucidate malware patterns. In this paper, we propose a novel efficient deep learning network with multi-streams for Android malware family classification. We first obtain the input data for a convolutional neural network (CNN) in string format from some main files or sections contained in each Android malicious app. We then classify malware families by applying a 1-dimensional convolution filter-based network for the files or sections. Further, by using gradient analysis to visualize the important files and sections in malicious apps, we attempt to intuitively grasp which files or sections are the most significant for malware family classification. To validate the effectiveness of our approach, we conduct extensive experiments with the well-known DREBIN and AMD malware datasets, and we compare our approach with existing methods. Our experimental results show that the 1D CNN model is more accurate than the 2D CNN model, and that the <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">code_item</monospace> part in the classes.dex is the most relevant feature for malware classification, as it is more relevant than other parts such as AndroidManifest.xml and certificate. The proposed method achieves the best accuracy of 93.2% by using 1D convolution filters with multi-streams for the main files and sections of the malware samples.
- Research Article
185
- 10.1145/3162625
- Jul 31, 2017
- ACM Transactions on Software Engineering and Methodology
The number of malicious Android apps is increasing rapidly. Android malware can damage or alter other files or settings, install additional applications, and so on. To determine such behaviors, a security analyst can significantly benefit from identifying the family to which an Android malware belongs rather than only detecting if an app is malicious. Techniques for detecting Android malware, and determining their families, lack the ability to handle certain obfuscations that aim to thwart detection. Moreover, some prior techniques face scalability issues, preventing them from detecting malware in a timely manner. To address these challenges, we present a novel machine-learning-based Android malware detection and family identification approach, RevealDroid, that operates without the need to perform complex program analyses or to extract large sets of features. Specifically, our selected features leverage categorized Android API usage, reflection-based features, and features from native binaries of apps. We assess RevealDroid for accuracy, efficiency, and obfuscation resilience using a large dataset consisting of more than 54,000 malicious and benign apps. Our experiments show that RevealDroid achieves an accuracy of 98% in detection of malware and an accuracy of 95% in determination of their families. We further demonstrate RevealDroid’s superiority against state-of-the-art approaches.