Machine learning-based dynamic analysis of Android apps with improved code coverage

Suleiman Y. Yerima,Sakir Sezer,Mohammed K. Alzaylaee

doi:10.1186/s13635-019-0087-1

Suleiman Y. Yerima, Sakir Sezer + Show 1 more

Open Access

https://doi.org/10.1186/s13635-019-0087-1

Copy DOI

Abstract

This paper investigates the impact of code coverage on machine learning-based dynamic analysis of Android malware. In order to maximize the code coverage, dynamic analysis on Android typically requires the generation of events to trigger the user interface and maximize the discovery of the run-time behavioral features. The commonly used event generation approach in most existing Android dynamic analysis systems is the random-based approach implemented with the Monkey tool that comes with the Android SDK. Monkey is utilized in popular dynamic analysis platforms like AASandbox, vetDroid, MobileSandbox, TraceDroid, Andrubis, ANANAS, DynaLog, and HADM. In this paper, we propose and investigate approaches based on stateful event generation and compare their code coverage capabilities with the state-of-the-practice random-based Monkey approach. The two proposed approaches are the state-based method (implemented with DroidBot) and a hybrid approach that combines the state-based and random-based methods. We compare the three different input generation methods on real devices, in terms of their ability to log dynamic behavior features and the impact on various machine learning algorithms that utilize the behavioral features for malware detection. Experiments performed using 17,444 applications show that overall, the proposed methods provide much better code coverage which in turn leads to more accurate machine learning-based malware detection compared to the state-of- the- art approach.

Highlights

With nearly 80% market share, Google Android leads other mobile operating systems
The study in this paper focuses on investigating the impact of the input generation methods on performance of machine learning-based Android malware detection, which has not yet been addressed in previous works
Even when considering only the top 20 or top 40 ranked features, the combined information gain scores maintained the same ranking. These results show that the different code coverage capacities of the three methods had an impact on the most important/significant behavioral features, which affected the performance of the machine learning classifiers

Summary

Introduction

Over 65 billion downloads have been made from the official Google play store, and there are currently more than 1 billion Android devices worldwide [1]. According to Statista [2], there will be around 1.5 billion Android devices shipped worldwide by 2021. Due to the increasing popularity of Android, malware targeting the platform has increased significantly over the last few years. According to a recent report from McAfee, there are around 2.5 million new Android malware samples exposed every year, increasing the total. In order to mitigate the spread of malware, Google introduced Bouncer to its store in Feb 2012. Bouncer is the system used to monitor submitted applications for potentially harmful behaviors by testing the submitted apps in a sandbox.

Methods

Results

Conclusion