Abstract

In previous work, imbalanced datasets composed of more benign samples (the majority class) than the malicious one (the minority class) have been widely adopted in Android malware detection. These imbalanced datasets bias learning toward the majority class, so that the minority class examples are more likely to be misclassified. To solve the problem, we propose a new oversampling method called fuzzy–synthetic minority oversampling technique, which is based on fuzzy set theory and the synthetic minority oversampling technique method. As the sample size of the majority class increases relative to that of the minority class, fuzzy–synthetic minority oversampling technique generates more synthetic examples for each minority class examples in the fuzzy region, where the minority examples have a low degree of membership to the minority class and are more likely to be misclassified. Using the new synthetic examples, the classifiers build larger decision regions that contain more minority examples, and they are no longer biased to the majority class. Compared with synthetic minority oversampling technique and Borderline–synthetic minority oversampling technique methods, fuzzy–synthetic minority oversampling technique achieves higher accuracy on both the minority class and the entire datasets.

Highlights

  • Malware continues to increase with the rapid development of mobile networks, especially on the Android platform and services

  • Fuzzy set theory provides a methodology for data analysis; here, we extend fuzzy set theory to the task of Android malware detection in imbalanced datasets

  • In the experiments described in the previous subsection, we showed that combining fuzzySMOTE and support vector machines (SVMs) is better than SVM combined with other oversampling methods

Read more

Summary

Introduction

Malware (malicious software) continues to increase with the rapid development of mobile networks, especially on the Android platform and services. Android malware can infect and harm Android platforms and services through various methods such as malicious websites, spam, malicious SMS messages, and malware-bearing advertisements. There are two effective types of approaches: static analysis by decompiling the source code and dynamic analysis by monitoring application execution at runtime.[3] The datasets for most Android malware detection experiments are composed of both benign and malicious applications. Because it is difficult to create a comprehensive malware collection, the datasets used in the experiments are typically imbalanced: number of benign applications is larger than the number of malware applications.[4,5,6] this imbalance problem is usually not considered seriously

Objectives
Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call