Hybrid Approach with Membership-Density Based Oversampling for handling multi-class imbalance in Internet Traffic Identification with overlapping and noise

Hartono Hartono,Rahmad B.Y Syah

doi:10.1016/j.icte.2024.04.007

Abstract

Internet Traffic identification is a crucial method for monitoring Internet application activities and is essential for Internet management and security. Internet traffic data typically displays imbalanced distributions. The uneven distribution of instances in each class indicates the class imbalance problem. This problem can cause a decrease in classification performance because the classifier assumes the dataset has a balanced class distribution. Internet Traffic Identification dataset is often accompanied by overlapping and noise. The hybrid approach to handling class imbalances involving data-level and ensemble-based approaches is usually chosen to overcome this problem. Data-level with oversampling using SMOTE is the choice because of its ability to synthesize new samples for minority classes. However, SMOTE-generated samples tend to be noisy and overlap with the majority of samples. This research proposes the application of a Hybrid Approach with Membership-density-based Oversampling to tackle this challenge. This research emphasizes the importance of applying membership degrees in determining samples that will group samples into safe, overlapping, and noisy areas. Then, top samples will be selected based on density ratio, stability, and score for safe and overlapping safe areas. The study findings that the proposed method effectively addresses multi-class imbalances in six Internet Traffic Identification datasets, yielding slightly improved average accuracy, FbMeasur, and class balance accuracy results compared to other testing methods, though the difference is not statistically significant. The noise and overlapping scenes experiments demonstrate that the average accuracy obtained is superior, showing a considerable difference compared to all test methods.

Full Text