A Novel Neural Network Architecture Using Automated Correlated Feature Layer to Detect Android Malware Applications

Amerah Alabrah

doi:10.3390/math11204242

Abstract

Android OS devices are the most widely used mobile devices globally. The open-source nature and less restricted nature of the Android application store welcome malicious apps, which present risks for such devices. It is found in the security department report that static features such as Android permissions, manifest files, and API calls could significantly reduce malware app attacks on Android devices. Therefore, an automated method for malware detection should be installed on Android devices to detect malicious apps. These automated malware detection methods are developed using machine learning methods. Previously, many studies on Android OS malware detection using different feature selection approaches have been proposed, indicating that feature selection is a widely used concept in Android malware detection. The feature dependency and the correlation of the features enable the malicious behavior of an app to be detected. However, more robust feature selection using automated methods is still needed to improve Android malware detection methods. Therefore, this study proposed an automated ANN-method-based Android malware detection method. To validate the proposed method, two public datasets were used in this study, namely the CICInvestAndMal2019 and Drebin/AMD datasets. Both datasets were preprocessed via their static features to normalize the features as binary values. Binary values indicate that certain permissions in any app are enabled (1) or disabled (0). The transformed feature sets were given to the ANN classifier, and two main experiments were conducted. In Experiment 1, the ANN classifier used a simple input layer, whereas a five-fold cross-validation method was applied for validation. In Experiment 2, the proposed ANN classifier used a proposed feature selection layer. It includes selected features only based on correlation or dependency with respect to benign or malware apps. The proposed ANN-method-based results are significant, improved, and robust and were better than those presented in previous studies. The overall results of using the five-fold method on the CICInvestAndMal2019 dataset were a 95.30% accuracy, 96% precision, 98% precision, and 92% F1-score. Likewise, on the AMD/Drebin dataset, the overall scores were a 99.60% accuracy, 100% precision and recall, and 99% F1-score. Furthermore, the computational cost of both experiments was calculated to prove the performance improvement brought about by the proposed ANN classifier compared to the simple ANN method with the same time of training and prediction.

Full Text