Catch them alive: A malware detection approach through memory forensics, manifold learning and computer vision

Ahmet Selman Bozkir,Ersan Tahillioglu,Murat Aydos,Ilker Kara

doi:10.1016/j.cose.2020.102166

Abstract

The everlasting increase in usage of information systems and online services have triggered the birth of the new type of malware which are more dangerous and hard to detect. In particular, according to the recent reports, the new type of fileless malware infect the victims’ devices without a persistent trace (i.e. file) on hard drives. Moreover, existing static malware detection methods in literature often fail to detect sophisticated malware utilizing various obfuscation and encryption techniques. Our contribution in this study is two-folded. First, we present a novel approach to recognize malware by capturing the memory dump of suspicious processes which can be represented as a RGB image. In contrast to the conventional approaches followed by static and dynamic methods existing in the literature, we aimed to obtain and use memory data to reveal visual patterns that can be classified by employing computer vision and machine learning methods in a multi-class open-set recognition regime. And second, we have applied a state of art manifold learning scheme named UMAP to improve the detection of unknown malware files through binary classification. Throughout the study, we have employed our novel dataset covering 4294 samples in total, including 10 malware families along with the benign executables. Lastly, we obtained their memory dumps and converted them to RGB images by applying 3 different rendering schemes. In order to generate their signatures (i.e. feature vectors), we utilized GIST and HOG (Histogram of Gradients) descriptors as well as their combination. Moreover, the obtained signatures were classified via machine learning algorithms of j48, RBF kernel-based SMO, Random Forest, XGBoost and linear SVM. According to the results of the first phase, we have achieved prediction accuracy up to 96.39% by employing SMO algorithm on the feature vectors combined with GIST+HOG. Besides, the UMAP based manifold learning strategy has improved accuracy of the unknown malware recognition models up to 12.93%, 21.83%, 20.78% on average for Random Forest, linear SVM and XGBoost algorithms respectively. Moreover, on a commercially available standard desktop computer, the suggested approach takes only 3.56 s for analysis on average. The results show that our vision based scheme provides an effective protection mechanism against malicious applications.

Full Text