Abstract

Unlabeled training examples are readily available in many applications, but labeled examples are fairly expensive to obtain. For instance, in our previous works on classification of peer-to-peer (P2P) Internet traffics, we observed that only about 25% of examples can be labeled as “P2P”or “NonP2P” using a port-based heuristic rule. We also expect that even fewer examples can be labeled in the future as more and more P2P applications use dynamic ports. This fact motivates us to investigate the techniques which enhance the accuracy of P2P traffic classification by exploiting the unlabeled examples. In addition, the Internet data flows dynamically in large volumes (streaming data). In P2P applications, new communities of peers often join and old communities of peers often leave, requiring the classifiers to be capable of updating the model incrementally, and dealing with concept drift. Based on these requirements, this paper proposes an incremental Tri-Training (iTT) algorithm. We tested our approach on a real data stream with 7.2 Mega labeled examples and 20.4 Mega unlabeled examples. The results show that iTT algorithm can enhance accuracy of P2P traffic classification by exploiting unlabeled examples. In addition, it can effectively deal with dynamic nature of streaming data to detect the changes in communities of peers. We extracted attributes only from the IP layer, eliminating the privacy concern associated with the techniques that use deep packet inspection.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.