Advanced persistent threats (APTs) are organized prolonged cyberattacks by sophisticated attackers with the intent of stealing critical information. Although APT activities are stealthy and evade detection by traditional detection tools, they interact with the system components to make progress in the attack. These interactions lead to information flows that are recorded in the form of a system log. Dynamic Information Flow Tracking (DIFT) has been shown to be an effective way to detect APTs using information flows. A DIFT-based detection mechanism dynamically performs security analysis on the information flows to detect possible attacks. However, wide range security analysis using DIFT results in a significant increase in performance overhead and high rates of false-positives and false-negatives. In this paper, we model the strategic interaction between APT and DIFT as a non-cooperative stochastic game. The game unfolds on a state space constructed from an information flow graph (IFG) that is extracted from the system log. The objective of the APT in the game is to choose transitions in the IFG to find an optimal path in the IFG from an entry point of the attack to an attack target. On the other hand, the objective of DIFT is to dynamically select nodes in the IFG to perform security analysis for detecting APT. Our game model has imperfect information as the players are unaware of the actions of the opponent. We consider two scenarios of the game (i) the false-positive and false-negative rates of DIFT (i.e., transition probabilities of the game) are known and (ii) the false-positive and false-negative rates are unknown. For case (i), we propose a value iteration-based algorithm and prove that the solution converges to the optimal solution (Nash equilibrium). Case (ii) translates to an incomplete information game with unknown transition probabilities. For case (ii), we propose a supervised learning-based algorithm, referred to as Hierarchical Supervised Learning (HSL) algorithm. HSL integrates a neural network, to predict the value vector of the game, with a policy iteration algorithm to compute an approximate equilibrium. We implemented our algorithms for cases (i) and (ii) on real attack datasets for nation state and ransomware attacks and validated the performance of our approach. We compared the performance of the HSL algorithm when the transition probabilities are unknown with instances with known transition probabilities and demonstrated that HSL algorithm converges to a solution close to optimal (i.e., optimal value vector) while the value vector obtained using greedy does not converge to optimal for 44.4% of the states and the mean absolute error is almost 200 times that of the HSL.
Read full abstract