Abstract

In the field of network security, the process of labeling a network traffic dataset is specially expensive since expert knowledge is required to perform the annotations. With the aid of visual analytic applications such as RiskID, the effort of labeling network traffic is considerable reduced. However, since the label assignment still requires an expert pondering several factors, the annotation process remains a difficult task. The present article introduces a novel active learning strategy for building a random forest model based on user previously-labeled connections. The resulting model provides to the user an estimation of the probability of the remaining unlabeled connections helping him in the traffic annotation task. The article describes the active learning strategy, the interfaces with the RiskID system, the algorithms used to predict botnet behavior, and a proposed evaluation framework. The evaluation framework includes studies to assess not only the prediction performance of the active learning strategy but also the learning rate and resilience against noise as well as the improvements on other well known labeling strategies. The framework represents a complete methodology for evaluating the performance of any active learning solution. The evaluation results showed proposed approach is a significant improvement over previous labeling strategies.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.