A comparison of improving multi-class imbalance for internet traffic classification

Qiong Liu,Zhen Liu

doi:10.1007/s10796-012-9368-7

Abstract

Most research of class imbalance is focused on two class problem to date. A multi-class imbalance is so complicated that one has little knowledge and experience in Internet traffic classification. In this paper we study the challenges posed by Internet traffic classification using machine learning with multi-class unbalanced data and the ability of some adjusting methods, including resampling (random under-sampling, random over-sampling) and cost-sensitive learning. Then we empirically compare the effectiveness of these methods for Internet traffic classification and determine which produces better overall classifier and under what circumstances. Main works are as below. (1) Cost-sensitive learning is deduced with MetaCost that incorporates the misclassification costs into the learning algorithm for improving multi-class imbalance based on flow ratio. (2) A new resampling model is presented including under-sampling and over-sampling to make the multi-class training data more balanced. (3) The solution is presented to compare among three methods or to compare three methods with original case. Experiment results are shown on sixteen datasets that flow g-mean and byte g-mean are statistically increased by 8.6 % and 3.7 %; 4.4 % and 2.8 %; 11.1 % and 8.2 % when three methods are compared with original case. Cost-sensitive learning is as the first choice when the sample size is enough, but resampling is more practical in the rest.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A comparison of improving multi-class imbalance for internet traffic classification

Abstract

Talk to us

Similar Papers

More From: Information Systems Frontiers

Lead the way for us

Journal: Information Systems Frontiers	Publication Date: Jul 18, 2012
Citations: 24

Similar Papers

Studying cost-sensitive learning for multi-class imbalance in Internet traffic classification
Zhen Liu ... Qiong Liu
The Journal of China Universities of Posts and Telecommunications | VOL. 19
Zhen Liu, et. al.Zhen Liu ... Qiong Liu
01 Dec 2012
The Journal of China Universities of Posts and Telecommunications | VOL. 19

A new re-sampling method for network traffic classification using SML
Wang Ruoyu ... Liu Zhen
-
Wang Ruoyu, et. al. Wang Ruoyu ... Liu Zhen
01 Dec 2010
01 Dec 2010

Addressing the Big Data Multi-class Imbalance Problem with Oversampling and Deep Learning Neural Networks
V M González-Barcenas ... R M Valdovinos
-
V M González-Barcenas, et. al.V M González-Barcenas ... R M Valdovinos
01 Jan 2019
01 Jan 2019

Cost sensitive active learning based on self-training
Yongcheng Wu
-
Yongcheng WuYongcheng Wu
01 May 2014
01 May 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A comparison of improving multi-class imbalance for internet traffic classification

Abstract

Talk to us

Similar Papers

More From: Information Systems Frontiers