Abstract

As a basic method for monitoring the activities of Internet applications, traffic identification is very important for Internet management and security. Internet traffic data naturally exhibits imbalanced distributions, but this problem is rarely considered by the research community. Data gravitation-based classification (DGC) is a new classification model for handling imbalance data sets and we proposed an imbalanced DGC (IDGC) model in our previous study. In the present study, we developed an IDGC-based model to resolve imbalanced Internet traffic identification problems. First, we constructed six imbalanced traffic data sets from three original traffic data sets and we then extracted their early stage features according to the packet sizes. In identification experiments, we compared the performance of six standard algorithms, including DGC and four imbalanced algorithms with IDGC. The experimental results demonstrated that the standard classification models could achieve high accuracy with imbalanced traffic data, but their imbalanced performance was not as good, and their generalizability was also a problem. By contrast, IDGC performed very well and in a stable manner in terms of imbalanced performance measures, thereby demonstrating its effectiveness for imbalanced traffic identification.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call