A Comprehensive Survey on Machine Learning Techniques for Android Malware Detection

Vasileios Kouliaridis,Georgios Kambourakis

doi:10.3390/info12050185

Vasileios Kouliaridis, Georgios Kambourakis

Open Access

https://doi.org/10.3390/info12050185

Copy DOI

Journal: Information	Publication Date: Apr 25, 2021
Citations: 54	License type: CC BY 4.0

Affiliation: University of the Aegean, Joint Research Centre

Abstract

Year after year, mobile malware attacks grow in both sophistication and diffusion. As the open source Android platform continues to dominate the market, malware writers consider it as their preferred target. Almost strictly, state-of-the-art mobile malware detection solutions in the literature capitalize on machine learning to detect pieces of malware. Nevertheless, our findings clearly indicate that the majority of existing works utilize different metrics and models and employ diverse datasets and classification features stemming from disparate analysis techniques, i.e., static, dynamic, or hybrid. This complicates the cross-comparison of the various proposed detection schemes and may also raise doubts about the derived results. To address this problem, spanning a period of the last seven years, this work attempts to schematize the so far ML-powered malware detection approaches and techniques by organizing them under four axes, namely, the age of the selected dataset, the analysis type used, the employed ML techniques, and the chosen performance metrics. Moreover, based on these axes, we introduce a converging scheme which can guide future Android malware detection techniques and provide a solid baseline to machine learning practices in this field.

Highlights

We categorize and succinctly analyze state-of-the-art works in the literature during the last seven years, i.e., from 2014 to 2021, based on the analysis type, feature extraction method, dataset, machine learning (ML) classification techniques, and metrics used in their performance evaluation
We elaborate on our findings and research trends, as well as possible issues and future directions
It becomes obvious that the majority of the approaches embrace a different set of basic parameters, including the dataset, the analysis, and the detection evaluation metrics

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Anomaly-based detection can be further categorized depending on the type of analysis, i.e., static, dynamic, and hybrid. Assorted app features can be extracted depending on the analysis type, either static, dynamic, or hybrid. While there are many contributions in the literature leveraging ML for mobile malware detection on the Android platform, most of them rely on diverse metrics, classification models, and performance improvement techniques. In an effort to mitigate these issues, the work at hand contributes to the following goals: Provides a detailed mapping of the contemporary ML techniques regarding Android malware detection proposed in the literature during the last 7 years, namely from 2017 to 2021.

Relevant Surveys

Literature Survey

Findings

Discussion

Conclusions