Inferring Protocol State Machine from Network Traces: A Probabilistic Approach

Yipeng Wang,Buyun Qu,Danfeng Yao,Zhibin Zhang,Li Guo

doi:10.1007/978-3-642-21554-4_1

Abstract

Application-level protocol specifications (i.e., how a protocol should behave) are helpful for network security management, including intrusion detection and intrusion prevention. The knowledge of protocol specifications is also an effective way of detecting malicious code. However, current methods for obtaining unknown protocol specifications highly rely on manual operations, such as reverse engineering which is a major instrument for extracting application-level specifications but is time-consuming and laborious. Several works have focus their attentions on extracting protocol messages from real-world trace automatically, and leave protocol state machine unsolved. In this paper, we propose Veritas, a system that can automatically infer protocol state machine from real-world network traces. The main feature of Veritas is that it has no prior knowledge of protocol specifications, and our technique is based on the statistical analysis on the protocol formats. We also formally define a new model – probabilistic protocol state machine (P-PSM), which is a probabilistic generalization of protocol state machine. In our experiments, we evaluate a text-based protocol and two binary-based protocols to test the performance of Veritas. Our results show that the protocol state machines that Veritas infers can accurately represent 92% of the protocol flows on average. Our system is general and suitable for both text-based and binary-based protocols. Veritas can also be employed as an auxiliary tool for analyzing unknown behaviors in real-world applications.

Full Text