Network infiltrations due to advanced persistent threats (APTs) have significantly grown in recent years. Their primary objective is to gain unauthorized access to network assets, compromise system and data. APTs are stealthy and remain dormant for an extended period of time, which makes their detection challenging. In this article, we leverage machine learning (ML) to detect hosts in a network that are a target of an APT attack. We evaluate a number of ML classifiers to detect susceptible hosts in the Los Alamos National Lab dataset. We (i) scrutinize graph-based features extracted from host authentication logs, (ii) use feature engineering to reduce dimensionality, (iii) explore balancing the training dataset using over- and under-sampling techniques, (iv) evaluate numerous supervised ML techniques and their ensemble, (v) compare our classification model to the state-of-the-art approaches that leverage the same dataset, and show that our model outperforms them with respect to prediction performance and overhead, and (vi) perturb the attack patterns to study the influence of change in attack frequency and scale on classification performance, and propose a solution for such adversarial behavior.
Read full abstract