Abstract

Abstract If 248 statistical features are used to characterize network traffic flows, the computation cost of classifier will be overlarge. The feature selection methods referenced here improve the accuracy of majority classes and meanwhile decrease the accuracy in minority classes as the cost. As a result, it brings about the multi-class imbalance problem. In this paper, main contributions include two aspects below. 1) An evaluation criterion based on information theory was proposed to assess how much do one feature bias towards one class. 2) A new feature selection method named BFS was proposed to reduce features and alleviate multi-class imbalance. BFS was compared with fast correlation-based filter (FCBF) and full feature set using Naive Bayes and ten skewed datasets. The results show that 1) BFS is more advantage to maintain the balance of multi-class classification results than FCBF, such as the reduction of g-mean is just about 8% using BFS, 2) classification accuracy of Naive Bayes using BFS can achieve to 90%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.