PROV-GEM: Automated Provenance Analysis Framework using Graph Embeddings

Maya Kapoor,Siddharth Krishnan,Thomas Moyer,Joshua Melton,Michael Ridenhour

doi:10.1109/icmla52953.2021.00273

Abstract

Data provenance graphs, detailed traces of system behavior, are a popular construct to analyze and forecast malicious cyber activity like advanced persistent threats (APT). A critical limitation of existing analysis techniques is the lack of an automated analytic framework to predict APTs. In this work, we address that limitation by augmenting efficient capture and storage mechanisms to include automated analysis. Specifically, we propose PROV-GEM, a deep graph learning framework to identify malicious anomalous behavior from provenance data. Since data provenance graphs are complex datasets often expressed as heterogeneous attributed multiplex networks, we use a unified relation-aware embedding framework to capture the necessary contexts and associated interactions between the various entities manifest in the data. Furthermore, provenance graphs by nature are rich detailed structures that are heavily attributed compared to other complex systems that have been used traditionally in graph machine learning applications. Towards that end, our framework uniquely captures “multi-embeddings” that can represent varied contexts of nodes and their multi-faceted nature. We demonstrate the efficacy of our embeddings by applying PROV-GEM to two publicly available APT provenance graph datasets from StreamSpot and Unicorn. PROV-GEM achieves strong performance on both datasets with a 99% accuracy and 97% F1-score on the StreamSpot dataset, and a 97% accuracy and 89% F1-score on the Unicorn dataset, equaling or outperforming comparable state-of-the-art APT threat detection models. Unlike other frameworks, PROV-GEM utilizes an efficient graph convolutional approach coupled with relational self-attention to generate rich graph embeddings that capture the complex topology of data provenance graphs, providing an effective automated analytic framework for APT detection.

Full Text