MalClassifier: Malware family classification using network flow sequence behaviour

Bushra A Alahmadi,Ivan Martinovic

doi:10.1109/ecrime.2018.8376209

Abstract

Anti-malware vendors receive daily thousands of potentially malicious binaries to analyse and categorise before deploying the appropriate defence measure. Considering the limitations of existing malware analysis and classification methods, we present MalClassifier, a novel privacy-preserving system for the automatic analysis and classification of malware using network flow sequence mining. MalClassifier allows identifying the malware family behind detected malicious network activity without requiring access to the infected host or malicious executable reducing overall response time. MalClassifier abstracts the malware families' network flow sequence order and semantics behaviour as an n-flow. By mining and extracting the distinctive n-flows for each malware family, it automatically generates network flow sequence behaviour profiles. These profiles are used as features to build supervised machine learning classifiers (K-Nearest Neighbour and Random Forest) for malware family classification. We compute the degree of similarity between a flow sequence and the extracted profiles using a novel fuzzy similarity measure that computes the similarity between flows attributes and the similarity between the order of the flow sequences. For classifier performance evaluation, we use network traffic datasets of ransomware and botnets obtaining 96% F-measure for family classification. MalClassifier is resilient to malware evasion through flow sequence manipulation, maintaining the classifier's high accuracy. Our results demonstrate that this type of network flow-level sequence analysis is highly effective in malware family classification, providing insights on reoccurring malware network flow patterns.

Full Text