Abstract

To solve the problem of classification for the sparsely labeled encrypted Discrete Sequence Protocol Data (DSPD), a two-stage semi-supervised classification method based on Generative Adversarial Networks (GAN) is proposed. As data annotation for DSPD is difficult, the semi-supervised GAN is used for classification by mining the data distribution hidden in unlabeled data. Meanwhile, a data filtering method based on traditional GAN is put forward to avoid the adverse effects caused by the irrelevant data. In addition, to directly process the raw data, the discriminator in the semi-supervised GAN is constructed by a Long Short-Term Memory (LSTM) network. Experiments results show that when the labeling rate in the class-balanced mixed data is as low as 0.48%, the average accuracy and F1 can reach 96%. Compared with the supervised classification based on LSTM, the accuracy and F1 of the proposed method are improved by more than 10 percentage points on average. For the class-imbalanced data, when the proportion of smallest class is not less than 10%, the accuracy and F1 can be maintained above 90%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call