Time-frame Analysis of System Calls Behavior in Machine Learning-Based Mobile Malware Detection

Alejandro Guerra-Manzanares,Hayretdin Bahsi,Sven Nomm

doi:10.1109/cset.2019.8904908

Abstract

Dynamic features are frequently used in the machine learning based approaches to detect malicious applications on Android devices. These features are constructed by collecting the system calls observed during a certain period of time. In spite of the popularity of this approach, very little attention has been paid to the analysis of the length of the collection time-frame and its impact on the detection performance of induced learning models, which constitutes the scope of this research. Such analysis helps to understand the accuracy and performance trade-off in data collection efforts taking place at the various stages of the machine learning workflow. Our time-frame analysis also addresses different data collection environments, emulator and real device, and the variations in detection capabilities in the case of detecting recent or older malware. System calls of 330 benign and malicious applications, collected on different time periods, are monitored and logged for each minute-long interval for a total of fifteen minutes. First, distribution of the system calls is analysed. After, the discriminatory power of each system call is evaluated cumulatively for each minute-long interval. Fisher’s score is used to assess the discriminatory power of each feature. It is revealed that the system calls observed during the first minute possess the highest discriminatory power, whereas the discriminatory power of the system calls observed on greater time-frames is lower. Finally, this finding is tested by training and evaluating traditional machine learning classifiers.

Full Text