Android malware attacks grow in both sophistication and volume day by day, thus android users are vulnerable to cyber-attacks. Researchers have developed many machine learning techniques to detect, block or mitigate these attacks. However, technological advancements, increase in Android mobile devices and the applications used on these devices, also increase problems in terms of user privacy due to malware. In this study, a comprehensive study is presented on the detection and classification of malicious applications using an up-to-date dataset containing 241 attributes. First, incorrect and missing data are detected and the relevant lines are removed, then normalization-based scaling is performed. After this preprocessing step, the data set is randomly divided into 70% training and 30% testing using hold-out cross validation. Finally, classification is carried out using 6 different machine learning methods: Multilayer Perceptron (MLP), Logistic Regression (LOGR), K-Nearest Neighbor (KNN), Decision Tree Classifier (DTC), Random Forest (RF). The comparison of modeling results demonstrates that RF machine learning technique can achieve the best performance with the level of 97% accuracy and the various other metrics for Android malware detection in real-world Android application sets.
Read full abstract