Abstract
In previous work, imbalanced datasets composed of more benign samples (the majority class) than the malicious one (the minority class) have been widely adopted in Android malware detection. These imbalanced datasets bias learning toward the majority class, so that the minority class examples are more likely to be misclassified. To solve the problem, we propose a new oversampling method called fuzzy–synthetic minority oversampling technique, which is based on fuzzy set theory and the synthetic minority oversampling technique method. As the sample size of the majority class increases relative to that of the minority class, fuzzy–synthetic minority oversampling technique generates more synthetic examples for each minority class examples in the fuzzy region, where the minority examples have a low degree of membership to the minority class and are more likely to be misclassified. Using the new synthetic examples, the classifiers build larger decision regions that contain more minority examples, and they are no longer biased to the majority class. Compared with synthetic minority oversampling technique and Borderline–synthetic minority oversampling technique methods, fuzzy–synthetic minority oversampling technique achieves higher accuracy on both the minority class and the entire datasets.
Highlights
Malware continues to increase with the rapid development of mobile networks, especially on the Android platform and services
Fuzzy set theory provides a methodology for data analysis; here, we extend fuzzy set theory to the task of Android malware detection in imbalanced datasets
In the experiments described in the previous subsection, we showed that combining fuzzySMOTE and support vector machines (SVMs) is better than SVM combined with other oversampling methods
Summary
Malware (malicious software) continues to increase with the rapid development of mobile networks, especially on the Android platform and services. Android malware can infect and harm Android platforms and services through various methods such as malicious websites, spam, malicious SMS messages, and malware-bearing advertisements. There are two effective types of approaches: static analysis by decompiling the source code and dynamic analysis by monitoring application execution at runtime.[3] The datasets for most Android malware detection experiments are composed of both benign and malicious applications. Because it is difficult to create a comprehensive malware collection, the datasets used in the experiments are typically imbalanced: number of benign applications is larger than the number of malware applications.[4,5,6] this imbalance problem is usually not considered seriously
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: International Journal of Distributed Sensor Networks
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.