Abstract

AbstractWith the rapid development of the Internet in recent years, different types of files could be found in the process of analyzing traffic. Sometimes, the transferred files are in a non-public format. To know the type of the transferred file, we need to analyze and extract the features of this file first, and then we can match the file in the traffic through the extracted feature. In the past, when we want to extract the features of a certain type of file for later matching, it is mostly based on manual analysis, which requires a lot of manpower and material resources, and the efficiency is low. In this paper, a system of feature analysis, extraction, and matching based on magic-number and the Aho–Corasick algorithm is designed and implemented. In the feature extraction module, this paper designs an algorithm to extract magic-number from files. Based on the Aho–Corasick algorithm, an algorithm for extracting string fetures is designed. The extracted features are used to identify the file type in the feature matching module. Through the experiments of four common types of files, it is found that the designed method can effectively identify the types of files based on the extracted features, and the recognition accuracy is generally higher than 90%. Through the analysis of the file features proposed in this paper, a lot of repetitive work can be reduced and the efficiency of traffic analysis can be improved.KeywordsFeature extractionFeature matchingN-gramAho–Corasick algorithm

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call