An advanced trie-based HTTP parsing algorithm

Anqi Li,Huan Wang,Dazhong He

doi:10.1109/icist.2016.7483389

Abstract

In the age of network globalization, a monotonically increasing amount of traffic data is produced continuously every day. Capturing HTTP traffic fast and accurately has become a valuable issue. In this paper, we analyzed Hypertext Transfer Protocol (HTTP) and found out patterns of its structure. Then we designed a data acquiring system with an advanced trie-based protocol parsing algorithm. Based on trie, this system can extract user-defined fields without mismatch or backtracking. According to characteristics of the specific HTTP protocol structure, we divided protocol fields into two classes and processed them respectively. When storing protocol fields, we chose to store the starting address and the offset of the required fields instead of storing all the string in memory. Thus we increased efficiency and saved memory in the meanwhile. HTTP traffic data was captured from a commercial Internet service provider with our own data acquiring system. In experiments, the proposed algorithm was demonstrated to be correct, efficient and extensible through the comparison with traditional HTTP protocol parsing method.

Full Text