Android Malware Familial Classification Based on DEX File Section Features
The rapid proliferation of Android malware is challenging the classification of the Android malware family. The traditional static method for classification is easily affected by the confusion and reinforcement, while the dynamic method is expensive in computation. To solve these problems, this paper proposes an Android malware familial classification method based on Dalvik Executable (DEX) file section features. First, the DEX file is converted into RGB (Red/Green/Blue) image and plain text respectively, and then, the color and texture of image and text are extracted as features. Finally, a feature fusion algorithm based on multiple kernel learning is used for classification. In this experiment, the Android Malware Dataset (AMD) was selected as the sample set. Two different comparative experiments were set up, and the method in this paper was compared with the common visualization method and feature fusion method. The results show that our method has a better classification effect with precision, recall and F1 score reaching 0.96. Besides, the time of feature extraction in this paper is reduced by 2.999 seconds compared with the method of frequent subsequence. In conclusion, the method proposed in this paper is efficient and precise in the classification of the Android malware family.
- Research Article
23
- 10.1016/j.comnet.2020.107639
- Oct 28, 2020
- Computer Networks
Comparative analysis of feature representations and machine learning methods in Android family classification
- Research Article
188
- 10.1109/access.2019.2962513
- Jan 1, 2020
- IEEE Access
This study presents a novel method to apply the RGB-D (Red Green Blue-Depth) sensors and fuse aligned RGB and NIR images with deep convolutional neural networks (CNN) for fruit detection. It aims to build a more accurate, faster, and more reliable fruit detection system, which is a vital element for fruit yield estimation and automated harvesting. Recent work in deep neural networks has led to the development of a state-of-the-art object detector termed Faster Region-based CNN (Faster R-CNN). A common Faster R-CNN network VGG16 was adopted through transfer learning, for the task of kiwifruit detection using imagery obtained from two modalities: RGB (red, green, blue) and Near-Infrared (NIR) images. Kinect v2 was used to take a bottom view of the kiwifruit canopy's NIR and RGB images. The NIR (1 channel) and RGB images (3 channels) were aligned and arranged side by side into a 6-channel image. The input layer of the VGG16 was modified to receive the 6-channel image. Two different fusion methods were used to extract features: Image-Fusion (fusion of the RGB and NIR images on input layer) and Feature-Fusion (fusion of feature maps of two VGG16 networks where the RGB and NIR images were input respectively). The improved networks were trained end-to-end using back-propagation and stochastic gradient descent techniques and compared to original VGG16 networks with RGB and NIR image input only. Results showed that the average precision (APs) of the original VGG16 with RGB and NIR image input only were 88.4% and 89.2% respectively, the 6-channel VGG16 using the Feature-Fusion method reached 90.5%, while that using the Image-Fusion method reached the highest AP of 90.7% and the fastest detection speed of 0.134 s/image. The results indicated that the proposed kiwifruit detection approach shows a potential for better fruit detection.
- Conference Article
168
- 10.1109/ccst.2019.8888430
- Oct 1, 2019
Android OS-based mobile devices have attracted numerous end-users since they are convenient to work with and offer a variety of features. As a result, Android has become one of the most important targets for attackers to launch their malicious intentions. Every year, researchers propose a novel Android malware analyzer framework to defend against real-world Android malware Apps. The researchers require an inclusive Android dataset to assess their Android analyzers. However, generating a comprehensive Android malware dataset is a challenging concept in malware scrutiny fields. In 2018, we made the first part of our Android malware dataset, CICAndMal2017 [16], publicly available while performing dynamic analyses on real smartphones. In this paper, we provide the second part of the CICAndMal2017 dataset [16] publicly available which includes permissions and intents as static features, and API calls as dynamic features. Besides, we examine these features with our two-layer Android malware analyzer. According to our analyses, we succeeded in achieving 95.3% precision in Static-Based Malware Binary Classification at the first layer, 83.3% precision in Dynamic-Based Malware Category Classification and 59.7% precision in Dynamic-Based Malware Family Classification at the second layer.
- Conference Article
45
- 10.1109/tase.2019.00-20
- Jul 1, 2019
Android malware has become a serious threat for our daily life, and thus there is a pressing need to effectively mitigate or defend against them. Recently, many approaches and tools to analyze Android malware have been proposed to protect legitimate users from the threat. However, most approaches focus on malware detection, while only a few of them consider malware classification or malware characterization. In this paper, we propose an extension of CDGDroid to classifying and characterizing Android malware families automatically. We first perform static analysis used in CDGDroid to extract control-flow graphs and data-flow graphs on the instruction level. Then we encode the graphs into matrices, and use them to build the family classification models via deep learning. For family characterization, we extract the n-gram sequences from the graphs, which are filtered according to the weights of the classification model built for the target family. And then we construct a vector space model and select the top-k sequences as a characterization of the target family. We have conducted some experiments to evaluate our approach and have identified that the family classification model taking the horizontal combination of CFG and DFG as features offers the best performance in terms of accuracy among all the models. Compared with CDGDroid, Drebin and many antivirus tools gathered in VirusTotal, our family classification model gives a better performance. Finally, We have also conducted experiments on family characterization, and the experimental results have shown that our characterization can capture the malicious behaviors of the testing families.
- Research Article
120
- 10.1109/tdsc.2017.2739145
- Oct 23, 2019
- IEEE Transactions on Dependable and Secure Computing
As the most widely used mobile platform, Android is also the biggest target for mobile malware. Given the increasing number of Android malware variants, detecting malware families is crucial so that security analysts can identify situations where signatures of a known malware family can be adapted as opposed to manually inspecting behavior of all samples. We present EC2 (Ensemble Clustering and Classification), a novel algorithm for discovering Android malware families of varying sizes-ranging from very large to very small families (even if previously unseen). We present a performance comparison of several traditional classification and clustering algorithms for Android malware family identification on DREBIN, the largest public Android malware dataset with labeled families. We use the output of both supervised classifiers and unsupervised clustering to design EC2. Experimental results on both the DREBIN and the more recent Koodous malware datasets show that EC2 accurately detects both small and large families, outperforming several comparative baselines. Furthermore, we show how to automatically characterize and explain unique behaviors of specific malware families, such as FakeInstaller, MobileTx, Geinimi. In short, EC2 presents an early warning system for emerging new malware families, as well as a robust predictor of the family (when it is not new) to which a new malware sample belongs, and the design of novel strategies for data-driven understanding of malware behaviors.
- Research Article
44
- 10.17485/ijst/2016/v9i21/90273
- Jun 20, 2016
- Indian Journal of Science and Technology
Background/Objectives: Now a days, Android Malware is coded so wisely that it has become very difficult to detect them. The static analysis of malicious code is not enough for detection of malware as this malware hides its method call in encrypted form or it can install the method at runtime. The system call tracing is an effective dynamic analysis technique for detecting malware as it can analyze the malware at the run time. Moreover, this technique does not require the application code for malware detection. Thus, this can detect that android malware also which are difficult to detect with static analysis of code. As Android was launched in 2008, so there were fewer studies available regarding the behavior of Android Malware Families and their characteristics. The aim of this work is to explore the behavior of 10 popular Android Malware Families focused on System Call Pattern of these families. Methods/Statistical Analysis: For this purpose, the authors have extracted the system call trace of 345 malicious applications from 10 Android Malware Families named FakeInstaller, Opfake, Plankton, DroidKungFu, BaseBridge, Iconosys, Kmin, Adrd and Gappusin using strace android tool and compared it with the system calls pattern of 300 Benign Applications to justify the behavior of malicious application. Findings: During the experiment, it is observed that the malicious applications invoke some system calls more frequently than benign applications. Different Android malware invokes the different set of system calls with different frequency. Applications/Improvements: This analysis can prove helpful in designing intrusion-detection systems for an android mobile device with more accuracy. Keywords: Android Kernal, Android Malware Installation Methods, Malware Families, System Call Analysis
- Research Article
1
- 10.1007/s10207-025-01073-5
- Jun 1, 2025
- International Journal of Information Security
The increasing complexity of Android malware has increased the need for efficient detection methods. Researchers have introduced new frameworks for analyzing Android malware in response to the growing threat of malicious applications. Traditional static analysis methods, which are widely used, are susceptible to obfuscation and can be bypassed easily. However, although dynamic analysis is more resilient, it is computationally intensive and costly to implement. In this paper, we introduce MalWave, a novel approach that uses audio signal processing to detect Android malware by converting Dalvik Executable (DEX) file sequences into audio signals. The extracted audio fingerprints are used as features for classification, addressing (i) malware detection, (ii) family classification, and (iii) packed malware detection. Evaluated on the AMD and AndroZoo datasets, MalWave achieves an F1+ score of 82.6% for malware detection and 68.7% for family classification, particularly in mostly represented categories. Despite challenges in detecting packed malware, MalWave demonstrates high computational efficiency, with feature extraction taking just 0.3 seconds on average per sample, making it a suitable tool for real-time detection in resource-constrained environments.
- Research Article
20
- 10.1109/access.2019.2914311
- Jan 1, 2019
- IEEE Access
Recently, Android malicious samples threaten billions of mobile end users’ security or privacy. The community researchers have designed many methods to automatically and accurately identify Android malware samples. However, the rapid increase of Android malicious samples outpowers the capabilities of traditional Android malware detectors and classifiers with respect to the cyber security risk management needs. It is important to identify the small proportion of Android malicious samples that may produce high cyber-security or privacy impact. In this paper, we propose a light-weight solution to automatically identify the Android malicious samples with high security and privacy impact. We manually check a number of Android malware families and corresponding security incidents and define two impact metrics for Android malicious samples. Our investigation results in a new Android malware dataset with impact ground truth (low impact or high impact). This new dataset is employed to empirically investigate the intrinsic characteristics of low-impact as well as high-impact malicious samples. To characterize and capture Android malicious samples’ pattern, reverse engineering is performed to extract semantic features to represent malicious samples. The leveraged features are parsed from both the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">AndroidManifest.xml</i> files as well as the disassembled binary <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">classes.dex</i> codes. Then, the extracted features are embedded into numerical vectors. Furthermore, we train highly accurate support vector machine and deep neural network classifiers to categorize the candidate Android malicious samples into low impact or high impact. The empirical results validate the effectiveness of our designed light-weight solution. This method can be further utilized for identifying those high-impact Android malicious samples in the wild.
- Research Article
- 10.1049/ise2/8843518
- Jan 1, 2025
- IET Information Security
The rapid growth and diversification of malware variants, driven by advanced code obfuscation, evasion, and antianalysis techniques, present a significant threat to cybersecurity. The inadequacy of traditional methods in accurately classifying these evolving threats highlights the need for effective and robust malware classification techniques. This article presents WinDroid, a novel visualization‐based framework for Windows and Android malware family (AMF) classification using hybrid features and hierarchical ensemble learning. The WinDroid system employs a multistage approach to malware classification, transforming binaries into Markov grayscale images, enhanced via contrast‐limited‐adaptive‐histogram‐equalization and gamma correction. Deep learning and handcrafted features are extracted and fuzed using graph attention networks (GATs), feeding into hierarchical support vector machines (SVMs) for accurate family classification. This framework effectively reduces information loss, enhances computational efficiency, and demonstrates outstanding performance. WinDroid delivers excellent results, achieving 99.53% accuracy on Windows and 99.65% on AMF classification, along with Cohen’s kappa coefficients of 99.01% and 99.28%, respectively, and outperforming state‐of‐the‐art baseline methods.
- Research Article
5
- 10.3390/rs12071142
- Apr 3, 2020
- Remote Sensing
To provide a realistic environment for remote sensing applications, point clouds are used to realize a three-dimensional (3D) digital world for the user. Motion recognition of objects, e.g., humans, is required to provide realistic experiences in the 3D digital world. To recognize a user’s motions, 3D landmarks are provided by analyzing a 3D point cloud collected through a light detection and ranging (LiDAR) system or a red green blue (RGB) image collected visually. However, manual supervision is required to extract 3D landmarks as to whether they originate from the RGB image or the 3D point cloud. Thus, there is a need for a method for extracting 3D landmarks without manual supervision. Herein, an RGB image and a 3D point cloud are used to extract 3D landmarks. The 3D point cloud is utilized as the relative distance between a LiDAR and a user. Because it cannot contain all information the user’s entire body due to disparities, it cannot generate a dense depth image that provides the boundary of user’s body. Therefore, up-sampling is performed to increase the density of the depth image generated based on the 3D point cloud; the density depends on the 3D point cloud. This paper proposes a system for extracting 3D landmarks using 3D point clouds and RGB images without manual supervision. A depth image provides the boundary of a user’s motion and is generated by using 3D point cloud and RGB image collected by a LiDAR and an RGB camera, respectively. To extract 3D landmarks automatically, an encoder–decoder model is trained with the generated depth images, and the RGB images and 3D landmarks are extracted from these images with the trained encoder model. The method of extracting 3D landmarks using RGB depth (RGBD) images was verified experimentally, and 3D landmarks were extracted to evaluate the user’s motions with RGBD images. In this manner, landmarks could be extracted according to the user’s motions, rather than by extracting them using the RGB images. The depth images generated by the proposed method were 1.832 times denser than the up-sampling-based depth images generated with bilateral filtering.
- Research Article
21
- 10.1109/tc.2022.3143439
- Nov 1, 2022
- IEEE Transactions on Computers
Android malware is an ongoing threat to billions of smart devices’ security, ranging from mobile phones to car infotainment systems. Despite numerous approaches and previous studies to develop solutions for detecting and preventing Android malware, the rapid continuous development of new malware variants requires a careful reconsideration and the development of effective methods to identify malware families given a meager number of malware instances. In this paper, we present DroidMalVet, a novel Android malware family classification and detection approach that does not require to perform complex program analyses or utilize large feature sets. DroidMalVet is the first to use a promising, diverse, and small set of software metrics as features in a supervised learning platform to classify and detect various Android malware families. Our extensive empirical evaluations on two large public malware datasets show that DroidMalVet accurately detects both small and large malware families with F-Score accuracy of 94.4% and 96%, and AUC equal to 99.5% and 99.7% on the malware families in Drebin and AMD datasets, respectively. Moreover, our results demonstrate the superior performance of DroidMalVet in detecting small families (i.e., families with few samples). DroidMalVet complements existing approaches and presents an early warning tool for detecting known and emerging malware families.
- Book Chapter
417
- 10.1007/978-3-319-60876-1_12
- Jan 1, 2017
To build effective malware analysis techniques and to evaluate new detection tools, up-to-date datasets reflecting the current Android malware landscape are essential. For such datasets to be maximally useful, they need to contain reliable and complete information on malware’s behaviors and techniques used in the malicious activities. Such a dataset shall also provide a comprehensive coverage of a large number of types of malware. The Android Malware Genome created circa 2011 has been the only well-labeled and widely studied dataset the research community had easy access to (As of 12/21/2015 the Genome authors have stopped supporting the dataset sharing due to resource limitation). But not only is it outdated and no longer represents the current Android malware landscape, it also does not provide as detailed information on malware’s behaviors as needed for research. Thus it is urgent to create a high-quality dataset for Android malware. While existing information sources such as VirusTotal are useful, to obtain the accurate and detailed information for malware behaviors, deep manual analysis is indispensable. In this work we present our approach to preparing a large Android malware dataset for the research community. We leverage existing anti-virus scan results and automation techniques in categorizing our large dataset (containing 24,650 malware app samples) into 135 varieties (based on malware behavioral semantics) which belong to 71 malware families. For each variety, we select three samples as representatives, for a total of 405 malware samples, to conduct in-depth manual analysis. Based on the manual analysis result we generate detailed descriptions of each malware variety’s behaviors and include them in our dataset. We also report our observations on the current landscape of Android malware as depicted in the dataset. Furthermore, we present detailed documentation of the process used in creating the dataset, including the guidelines for the manual analysis. We make our Android malware dataset available to the research community.
- Book Chapter
2
- 10.1007/978-3-030-80216-5_12
- Jan 1, 2021
With the openness and growing popularity of Android Operating system all over the world, it has become a target of attack for Malware authors who are determined to take advantage of over 2.5 billion monthly active users of Android devices. Despite Google’s various protection measures, android malware continues to grow in complexity and scope. In recent time, many research efforts have focused on detecting malware on the Android operating system using both static and dynamic approaches. Most of the existing techniques are still not perfect because of the problems of false positive, false negative and high detection time. In this work, a Priority Execution-based Approach for Detecting Android Malware (PEDAM) is proposed to solve some of these problems. In PEDAM, a two-phase dynamic analysis scheme is used for malware analysis. The first phase involves the use of a time-based filter for prioritizing the android application that will execute based on permissions and intents. Any suspected samples not captured in the first phase are further analysed in the second phase, which does behavioural analysis using Support Vector Machine classifier to analyse permissions, intent filters and Activity features set for effective detection. The evaluation of the proposed model on different Android malware families’ shows that PEDAM outperformed another android-based malware detection system known as Iterative Classifier Fusion System (ICFS) with improved accuracy of 1.04%. These results indicated that the approach could be deployed for detection of android malware.KeywordsMalware detectionAndroidDynamic analysisPermissionIntent filter
- Conference Article
1
- 10.1109/geoinformatics.2015.7378596
- Jun 1, 2015
Mapping vegetation fraction in crop fields is an important step in remote sensing applications for precision agriculture. Two critical limitations for using current satellite sensors are the lack of imagery with optimum spatial and spectral resolution and an unfavorable revisit time. Remote sensing sensors placed on low-altitude aerial platforms could fill this gap. This paper validated the availability of red green blue(RGB) and near infrared(NIR) imaging acquired from a delta-wing airplane platform with dual-camera for monitoring vegetation fraction, and explored the technological processes and methods for fast processing of remote sensing image. RGB imaging was used to calculate VI RGB and interpreted by different classification algorithms. We examined the classification accuracy of RGB images respectively in cotton yield estimation and rapid crops classification, further, to study the influence of flight altitude on the classification accuracy. Additionally, we assessed the applicability of NIR imaging in dynamic monitoring vegetation growth status. We found COM and ML achieved the best accuracy in cotton yield estimating, with overall accuracy of 95.42% and 96.25% at a 200m flight altitude. Besides, the result of analyzing the influence of flight altitudes (500m and 1000m) to crop quick classification indicated that VEG, COM and ML methods' variations associated with the flight altitudes, and classification accuracy at 1000m demonstrated more higher than 500m, which appeared better for mapping vegetation in a large area. In addition, we could found that NIR imaging had great potential in dynamic monitoring growth status of vegetation in the future. This paper provides evidence that RGB and NIR imaging acquired using a low-cost dual-camera onboard a delta-wing airplane at low altitudes were a suitable tool to use to discriminate vegetation. This opened the doors for the utilization of this platform and technology in precision agriculture applications and dynamic monitoring grassland biological disasters.
- Research Article
45
- 10.9781/ijimai.2020.09.001
- Jun 1, 2021
- International Journal of Interactive Multimedia and Artificial Intelligence
With the increase in the popularity of mobile devices, malicious applications targeting Android platform have greatly increased. Malware is coded so prudently that it has become very complicated to identify. The increase in the large amount of malware every day has made the manual approaches inadequate for detecting the malware. Nowadays, a new malware is characterized by sophisticated and complex obfuscation techniques. Thus, the static malware analysis alone is not enough for detecting it. However, dynamic malware analysis is appropriate to tackle evasion techniques but incapable to investigate all the execution paths and also it is very time consuming. So, for better detection and classification of Android malware, we propose a hybrid approach which integrates the features obtained after performing static and dynamic malware analysis. This approach tackles the problem of analyzing, detecting and classifying the Android malware in a more efficient manner. In this paper, we have used a robust set of features from static and dynamic malware analysis for creating two datasets i.e. binary and multiclass (family) classification datasets. These are made publically available on GitHub and Kaggle with the aim to help researchers and anti-malware tool creators for enhancing or developing new techniques and tools for detecting and classifying Android malware. Various machine learning algorithms are employed to detect and classify malware using the features extracted after performing static and dynamic malware analysis. The experimental outcomes indicate that hybrid approach enhances the accuracy of detection and classification of Android malware as compared to the case when static and dynamic features are considered alone.