Abstract

AbstractMCoM (Mixup Contrastive Mixup) is a new semi-supervised learning methodology that innovates a triplet mixup data augmentation approach to address the imbalanced data problem in tabular security data sets. Tabular data sets in cybersecurity domains are widely known to pose challenges for machine learning because of their heavily imbalanced data (e.g., a small number of labeled attack samples buried in a sea of mostly benign, unlabeled data). Semi-supervised learning leverages a small subset of labeled data and a large subset of unlabeled data to train a learning model. While semi-supervised methods have been well studied in image and language domains, in security domains they remain underutilized, especially on tabular security data sets which pose especially difficult contextual information loss and balance challenges for machine learning. Experiments applying MCoM to collected security data sets show promise for addressing these challenges, achieving state-of-the-art performance compared with other methods.KeywordsSemi-Supervised LearningContrastive LearningTabular Data SetsSecurity Data Sets

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call