GSDroid: Graph Signal Based Compact Feature Representation for Android Malware Detection

Roopak Surendran,Tony Thomas,Sabu Emmanuel

doi:10.1016/j.eswa.2020.113581

Abstract

Android malwares have evolved in sophistications and intelligence to become more and more evasive to existing detection systems especially those that are signature-based. Machine learning techniques have risen to become a better choice for detecting emerging sophisticated and intelligent Android malwares because of their high prediction accuracies in unseen data. The performance of a machine learning classifier highly depends on the optimal feature representation of the data points. The main behavior of an Android application usually gets reflected in its generated system call sequence. Therefore, system call analysis mechanisms are considered to be more effective for malware detection. The existing feature representations for system call based malware detection mechanisms have high dimensionality and lack the system call dependency relations. Curse of dimensionality (high dimensionality of feature vectors) is a problem in which the higher number of feature components in a feature vector will result in high sparsity in the data. This high sparsity in data can unnecessarily increase the storage space and processing time of the classifier. In order to overcome these limitations, in this paper, we propose a novel graph signal based low dimensional feature representation and extraction mechanism to detect Android malware applications. These low dimensional features are then incorporated as part of a machine learning based intelligent expert system that can automate the malware detection task. In our experimental evaluations, we found that a feature vector of dimension 16 was enough for malware classification when used with random forest and decision tree classifiers. We obtained a maximum accuracy of 0.99 with a feature vector of dimension 16 corresponding to the signal values of 16 prominent system calls when random forest classifier was applied.

Full Text