CANDYMAN: Classifying Android malware families by modelling dynamic traces with Markov chains

Alejandro Martín,Víctor Rodríguez-Fernández,David Camacho

doi:10.1016/j.engappai.2018.06.006

Alejandro Martín, Víctor Rodríguez-Fernández + Show 1 more

https://doi.org/10.1016/j.engappai.2018.06.006

Copy DOI

Abstract

Malware writers are usually focused on those platforms which are most used among common users, with the aim of attacking as many devices as possible. Due to this reason, Android has been heavily attacked for years. Efforts dedicated to combat Android malware are mainly concentrated on detection, in order to prevent malicious software to be installed in a target device. However, it is equally important to put effort into an automatic classification of the type, or family, of a malware sample, in order to establish which actions are necessary to mitigate the damage caused. In this paper, we present CANDYMAN, a tool that classifies Android malware families by combining dynamic analysis and Markov chains. A dynamic analysis process allows to extract representative information of a malware sample, in form of a sequence of states, while a Markov chain allows to model the transition probabilities between the states of the sequence, which will be used as features in the classification process. The space of features built is used to train classical Machine Learning, including methods for imbalanced learning, and Deep Learning algorithms, over a dataset of malware samples from different families, in order to evaluate the proposed method. Using a collection of 5,560 malware samples grouped into 179 different families (extracted from the Drebin dataset), and once made a selection based on a minimum number of relevant and valid samples, a final set of 4,442 samples grouped into 24 different malware families was used. The experimental results indicate a precision performance of 81.8% over this dataset.

Full Text