Abstract

Accurately identifying Internet traffic at the early stage is very important for the applications of traffic identification. Recent years, more and more research works have tried to build effective machine learning models to identify an Internet flow with the few packets at its early stage. However, a basic and important problem still needs to be studied in depth, that is how many packets are most effective in early stage Internet traffic identification. In this paper, we try to resolve this problem. Three Internet traffic data sets are applied. And the sizes of the first 10 packets are extracted for study. We firstly apply mutual information to analyze the information that the first n packets provide to the flow type. Then correlation analysis of each pair of adjacent packets is carried out to find out the feature redundancies. And then we execute a number of crossover identification experiments with different numbers of packets using 11 well-known supervised learning algorithms. Finally, statistical tests are applied for the experimental results to find out which number is the best performed one. Our experimental results show that 5–7 are the best packet numbers for early stage traffic identification.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.