Two Anatomists Are Better than One—Dual-Level Android Malware Detection

Vasileios Kouliaridis,Georgios Kambourakis,Dimitris Geneiatakis,Nektaria Potha

doi:10.3390/sym12071128

Vasileios Kouliaridis, Georgios Kambourakis + Show 2 more

Open Access

https://doi.org/10.3390/sym12071128

Copy DOI

Journal: Symmetry	Publication Date: Jul 7, 2020
Citations: 31	License type: CC BY 4.0

Affiliation: Joint Research Centre, University of the Aegean

Abstract

The openness of the Android operating system and its immense penetration into the market makes it a hot target for malware writers. This work introduces Androtomist, a novel tool capable of symmetrically applying static and dynamic analysis of applications on the Android platform. Unlike similar hybrid solutions, Androtomist capitalizes on a wealth of features stemming from static analysis along with rigorous dynamic instrumentation to dissect applications and decide if they are benign or not. The focus is on anomaly detection using machine learning, but the system is able to autonomously conduct signature-based detection as well. Furthermore, Androtomist is publicly available as open source software and can be straightforwardly installed as a web application. The application itself is dual mode, that is, fully automated for the novice user and configurable for the expert one. As a proof-of-concept, we meticulously assess the detection accuracy of Androtomist against three different popular malware datasets and a handful of machine learning classifiers. We particularly concentrate on the classification performance achieved when the results of static analysis are combined with dynamic instrumentation vis-à-vis static analysis only. Our study also introduces an ensemble approach by averaging the output of all base classification models per malware instance separately, and provides a deeper insight on the most influencing features regarding the classification process. Depending on the employed dataset, for hybrid analysis, we report notably promising to excellent results in terms of the accuracy, F1, and AUC metrics.

Highlights

Every year the number of mobile malicious applications in the wild increases significantly.For instance, according to a 2020 report by McAfee [1], hidden apps, known for their ability to conceal their presence after installation while annoying victims with invasive ads, have become the most active mobile threat
We concentrate on the classification performance achieved when the results of static analysis are combined with dynamic instrumentation vis-à-vis static analysis only
We first present the signature-based results in Section 4.1, and we detail on the classification results after training a Machine Learning (ML) model with the identical set of features

Summary

Introduction

Every year the number of mobile malicious applications (apps) in the wild increases significantly.For instance, according to a 2020 report by McAfee [1], hidden apps, known for their ability to conceal their presence after installation while annoying victims with invasive ads, have become the most active mobile threat. Malware writers encrypt their code and spread it throughout the program, making legacy signature-based detection increasingly harder. The galloping rise of mobile malware of any kind calls for more robust detection solutions by leveraging on Machine Learning (ML). Mobile malware detection schemes can be categorized into two broad classes, namely signature and anomaly-based. The former collects patterns and signatures stemming from known malware, and compares them against unknown pieces of code for determining their status. The latter class employs a more lax approach; by observing the normal behavior of a piece of code for a certain

Methods

Results

Discussion

Conclusion