Capturing Uncertainty Information and Categorical Characteristics for Network Payload Grouping in Protocol Reverse Engineering

Jian-Zhen Luo,Shun-Zheng Yu,Jun Cai

doi:10.1155/2015/962974

Abstract

As a promising tool to recover the specifications of unknown protocols, protocol reverse engineering has drawn more and more attention in research over the last decade. It is a critical task of protocol reverse engineering to extract the protocol keywords from network trace. Since the messages of different types have different sets of protocol keywords, it is an effective method to improve the accuracy of protocol keyword extraction by clustering the network payload of unknown traffic into clusters and analyzing each clusters to extract the protocol keywords. Although the classic algorithms such asK-means and EM can be used for network payload clustering, the quality of resultant traffic clusters was far from satisfactory when these algorithms are applied to cluster application layer traffic with categorical attributes. In this paper, we propose a novel method to improve the accuracy of protocol reverse engineering by applying a rough set-based technique for clustering the application layer traffic. This technique analyze multidimension uncertain information in multiple categorical attributes based on rough sets theory to cluster network payload, and apply the Minimum Description Length criteria to determine the optimal number of clusters. The experiments show that our method outperforms the existing algorithms and improves the results of protocol keyword extraction.

Highlights

Network protocol reverse engineering [1,2,3,4] is a promising approach to address the problem of recovering detailed specifications of unpublished or undocumented network protocols from the network trace
In order to build up the categorical data clustering, Mahmood et al [19] develop a framework to deal with mixed type attributes including numerical, categorical, and hierarchical attributes for a one-pass hierarchical clustering algorithm. They focus on analyzing network flow feature such as protocols (UDP, TCP, and ICMP) to identify interesting traffic patterns from network traffic data, while we aim to analyze the categorical features in application layer to cluster network traffic according to protocols and group protocol messages according to message types
We propose to apply a rough sets theory- (RST-)based approach to cluster application layer network traffic and group protocol messages according to message types

Summary

Introduction

Network protocol reverse engineering [1,2,3,4] is a promising approach to address the problem of recovering detailed specifications of unpublished or undocumented network protocols from the network trace. The specifications of protocols play an important role in the network security and management oriented issues, such as intrusion detection, fuzzing test [5], recovering and understanding command-and-command (C&C) protocols [6], and building intelligent honeypot [7]. Researchers deem that protocol reverse engineering is the only option available to build the understanding of proprietary protocol from network trace. The extraction of protocol keywords from network trace is a critical task of protocol reverse engineering. It is critical that the messages input to a protocol reverse engineering system belong to a single type. An adapted solution to these issues is to apply unsupervised clustering methods to group messages of the unknown traffic

Objectives

Results

Conclusion