THREATRACE: Detecting and Tracing Host-Based Threats in Node Level Through Provenance Graph Learning

Su Wang,Xia Yin,Tao Zhou,Xingang Shi,Jiahai Yang,Zhiliang Wang,Hongbin Sun,Dongqi Han,Han Zhang

doi:10.1109/tifs.2022.3208815

Abstract

Host-based threats such as Program Attack, Malware Implantation, and Advanced Persistent Threats (APT), are commonly adopted by modern attackers. Recent studies propose leveraging the rich contextual information in data provenance to detect threats in a host. Data provenance is a directed acyclic graph constructed from system audit data. Nodes in a provenance graph represent system entities (e.g., <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">processes</i> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">files</i> ) and edges represent system calls in the direction of information flow. However, previous studies, which extract features of the provenance graph, are not sensitive to the small quantity of threat-related entities and thus result in low performance when hunting stealthy threats. We present THREATRACE, an anomaly-based detector that detects host-based threats at system entity level without prior knowledge of attack patterns. We tailor GraphSAGE, an inductive graph neural network, to learn every benign entity’s role in a provenance graph. THREATRACE is a real-time system, which is scalable of monitoring a long-term running host and capable of detecting host-based intrusion in their early phase. We evaluate THREATRACE on five public datasets. The results show that THREATRACE outperforms seven state-of-the-art host intrusion detection systems.

Full Text