Abstract
Automatically discovering message formats of unknown service or system protocols from network traces has become important for a variety of applications, such as emulating the behavior of an unknown protocol in service virtualization, or enabling deep packet inspection in network security. Among existing schemes, the keyword extraction based approaches have been shown to be effective. Inspired by the template structure of protocol messages, recent works leverage the positions of keywords to extract message keywords more accurately. However, these methods are deficient for messages with large variations in length. To address this problem, we propose R-gram, which exploits the relative positions of keywords in messages, allowing the keywords to be robustly detected in variable length messages. It first extracts the common template of the messages in a given message trace with a fast sampling technique, and segments each message into blocks according to the relative positions of the common keywords in the template. It then identifies message keywords in each block by using a new concept and technique — relative positional n-gram (r-gram in short). Finally, the message keywords are used to separate all the messages into type-specific clusters and consequently derive the message format for each cluster. We have implemented and evaluated R-gram on real-world service traces containing either textual or binary protocol messages. Our experimental results show that R-gram is more accurate and robust than existing state-of-the-art tools in protocol message format extraction. Furthermore, R-gram is efficient for processing large-scale message traces.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.