Rigorously characterizing the statistical properties of cyber attacks is an important problem. In this paper, we propose the {\em first} statistical framework for rigorously analyzing honeypot-captured cyber attack data. The framework is built on the novel concept of {\em stochastic cyber attack process}, a new kind of mathematical objects for describing cyber attacks. To demonstrate use of the framework, we apply it to analyze a low-interaction honeypot dataset, while noting that the framework can be equally applied to analyze high-interaction honeypot data that contains richer information about the attacks. The case study finds, for the first time, that Long-Range Dependence (LRD) is exhibited by honeypot-captured cyber attacks. The case study confirms that by exploiting the statistical properties (LRD in this case), it is feasible to predict cyber attacks (at least in terms of attack rate) with good accuracy. This kind of prediction capability would provide sufficient early-warning time for defenders to adjust their defense configurations or resource allocations. The idea of "gray-box" (rather than "black-box") prediction is central to the utility of the statistical framework, and represents a significant step towards ultimately understanding (the degree of) the {\em predictability} of cyber attacks.
Read full abstract