LongCGDroid: Android malware detection through longitudinal study for machine learning and deep learning

Abdelhak Mesbah,Ibtihel Baddari,Mohamed Raihla

doi:10.5455/jjcit.71-1693392249

Abstract

This study aims to compare the longitudinal performance between machine learning and deep learning classifiers for Android malware detection, employing different levels of feature abstraction. Using a dataset of 200k Android apps labeled by date within a 10-year range (2013-2022), we propose the LongCGDroid, an image-based effective approach for Android malware detection. We use the semantic Call Graph API representation that is derived from the Control Flow Graph and Data Flow Graph to extract abstracted API calls. Thus, we evaluate the longitudinal performance of LongCGDroid against API changes. Different models are used, machine learning models (LR, RF, KNN, SVM) and deep learning models (CNN, RNN). Empirical experiments demonstrate a progressive decline in performance for all classifiers when evaluated on samples from later periods. Whereas, the deep learning CNN model under the class abstraction maintains a certain stability over time. In comparison with eight state-of-the-art approaches, LongCGDroid achieves higher accuracy.

Full Text