Abstract
Protocol reverse engineering can be applied to various security applications, including fuzzing, malware analysis, and intrusion detection. It aims to acquire an unknown protocol's format, semantic, and behavior specifications, where format extraction is the primary task. One subset of the mainstream research utilizes the network traffic for the reverse analysis. These approaches leverage various algorithms, such as multiple sequence alignment, frequent itemset mining, and information entropy to extract format information from messages. However, they are primarily intended to locate the keyword fields and have limitations in extracting contextual features or dealing with large data sets. This paper presents ProsegDL, a deep learning-based format extraction tool for binary protocol, with a specially designed method of generating training data sets. ProsegDL innovatively leverages image semantic segmentation and siamese network techniques, focusing on extracting the features of fields and identifying field boundaries for fixed format protocols. The tool is evaluated on six popular protocols. The results show that it has at most 13% higher precision, 23% higher recall than the comparison methods when inferring with a small data set, and at most 18% higher precision, 28% higher recall when inferring with a large number of messages.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have