Unsupervised Binary Protocol Clustering Based on Maximum Sequential Patterns

Jiaxin Shi,Dongyang Zhan,Zhongwei Li,Lin Ye

doi:10.32604/cmes.2022.017467

Abstract

With the rapid development of the Internet, a large number of private protocols emerge on the network. However, some of them are constructed by attackers to avoid being analyzed, posing a threat to computer network security. The blockchain uses the P2P protocol to implement various functions across the network. Furthermore, the P2P protocol format of blockchain may differ from the standard format specification, which leads to sniffing tools such as Wireshark and Fiddler not being able to recognize them. Therefore, the ability to distinguish different types of unknown network protocols is vital for network security. In this paper, we propose an unsupervised clustering algorithm based on maximum frequent sequences for binary protocols, which can distinguish various unknown protocols to provide support for analyzing unknown protocol formats. We mine the maximum frequent sequences of protocol message sets in bytes. And we calculate the fuzzy membership of the protocol message to each maximum frequent sequence, which is based on fuzzy set theory. Then we construct the fuzzy membership vector for each protocol message. Finally, we adopt K-means++ to split different types of protocol messages into several clusters and evaluate the performance by calculating homogeneity, integrity, and Fowlkes and Mallows Index (FMI). Besides, the clustering algorithms based on Needleman–Wunsch and the fixed-length prefix are compared with the algorithm presented in this paper. Compared with these traditional clustering methods, we demonstrate a certain improvement in the clustering performance of our work.

Highlights

Network protocol stipulates the format and sequence of the messages exchanged between the entities at the two ends of the communication in the computer network, and the entities receive the protocol messages and make appropriate actions
We propose an unsupervised protocol message clustering method based on the fuzzy membership of maximum frequent sequences, which effectively solves the problem of clustering difficulties caused by different protocol message lengths
Since a cluster represents a type of unknown protocol, each cluster is supposed to contain all of the corresponding protocol messages. To evaluate these factors quantitatively, we introduce the three commonly used indexes: homogeneity is used to measure the closeness of each cluster contains only one type of unknown protocol messages, completeness indicates how much the protocol messages of the same class are assigned to the same cluster, and Fowlkes and Mallows Index (FMI) is the overall evaluation of clustering performance

Summary

Introduction

Network protocol stipulates the format and sequence of the messages exchanged between the entities at the two ends of the communication in the computer network, and the entities receive the protocol messages and make appropriate actions. The workload of manually analyzing the format specifications of the unknown protocol is large, time-consuming, and error-prone. It takes 12 years for the SAMBA project to basically realize the extraction of the basic protocol specifications of the SMB protocol [10,11]. We propose an unsupervised protocol message clustering method based on the fuzzy membership of maximum frequent sequences, which effectively solves the problem of clustering difficulties caused by different protocol message lengths. 3. We introduce the number of protocol types as prior knowledge so that we can cluster the protocol messages, adjust different minimum support thresholds and calculate homogeneity, integrity and FMI to evaluate our method’s performance.

Related Work

Problem Statement

Unsupervised Clustering Based on Maximum Frequent Sequences

Maximum Frequent Sequence Mining

Fuzzy Membership Vector Construction and Protocol Message Clustering

Experiment and Analysis

Conclusion and Future Work

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computer Modeling in Engineering & Sciences	Publication Date: Jan 1, 2022
Citations: 5	License type: cc-by

R Discovery Prime

R Discovery Prime

Unsupervised Binary Protocol Clustering Based on Maximum Sequential Patterns

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computer Modeling in Engineering & Sciences

Lead the way for us

Similar Papers

Unknown Binary Protocol Recognition Algorithm Based on One Class of Classification and One-Dimensional CNN
Rajesh Kaluri ... Quan Shi
Mathematical Problems in Engineering | VOL. 2023
Rajesh Kaluri, et. al.Rajesh Kaluri ... Quan Shi
26 Apr 2023
Mathematical Problems in Engineering | VOL. 2023

A Format Reverse Method for Binary Protocol From Communication Data
Chunrui Zhang ... Dong Liu
-
Chunrui Zhang, et. al.Chunrui Zhang ... Dong Liu
01 Jan 2015
01 Jan 2015

Dynamic Combined with Static Analysis for Mining Network Protocol's Hidden Behavior
Yanjing Hu ... Qingqi Pei
International Journal of Business Data Communications and Networking | VOL. 13
Yanjing Hu, et. al.Yanjing Hu ... Qingqi Pei
01 Jul 2017
International Journal of Business Data Communications and Networking | VOL. 13

Analyze Network Protocol's Hidden Behavior
Yanjing Hu ... Liaojun Pang
-
Yanjing Hu, et. al.Yanjing Hu ... Liaojun Pang
01 Nov 2015
01 Nov 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Unsupervised Binary Protocol Clustering Based on Maximum Sequential Patterns

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computer Modeling in Engineering &amp; Sciences

More From: Computer Modeling in Engineering & Sciences