ProsegDL: Binary Protocol Format Extraction by Deep Learning-based Field Boundary Identification

Sen Zhao,Yicheng Zeng,Zhihui Zhao,Hongsong Zhu,Limin Sun,Shouguo Yang,Jinfa Wang

doi:10.1109/icnp55882.2022.9940264

Sen Zhao, Yicheng Zeng + Show 5 more

https://doi.org/10.1109/icnp55882.2022.9940264

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Protocol reverse engineering can be applied to various security applications, including fuzzing, malware analysis, and intrusion detection. It aims to acquire an unknown protocol's format, semantic, and behavior specifications, where format extraction is the primary task. One subset of the mainstream research utilizes the network traffic for the reverse analysis. These approaches leverage various algorithms, such as multiple sequence alignment, frequent itemset mining, and information entropy to extract format information from messages. However, they are primarily intended to locate the keyword fields and have limitations in extracting contextual features or dealing with large data sets. This paper presents ProsegDL, a deep learning-based format extraction tool for binary protocol, with a specially designed method of generating training data sets. ProsegDL innovatively leverages image semantic segmentation and siamese network techniques, focusing on extracting the features of fields and identifying field boundaries for fixed format protocols. The tool is evaluated on six popular protocols. The results show that it has at most 13% higher precision, 23% higher recall than the comparison methods when inferring with a small data set, and at most 18% higher precision, 28% higher recall when inferring with a large number of messages.

Full Text