MCoM: A Semi-Supervised Method for Imbalanced Tabular Security Data

Xiaodi Li,Bhavani Thuraisingham,Latifur Khan,Kevin W. Hamlen,Mahmoud Zamani,Shamila Wickramasuriya

doi:10.1007/978-3-031-10684-2_4

Xiaodi Li, Bhavani Thuraisingham + Show 4 more

https://doi.org/10.1007/978-3-031-10684-2_4

Copy DOI

Export

Save

Cite

Publication Date: Jan 1, 2022

Citations: 5

Affiliation: The University of Texas at Dallas

Abstract
Full-Text
Similar Papers

Abstract

Listen

AbstractMCoM (Mixup Contrastive Mixup) is a new semi-supervised learning methodology that innovates a triplet mixup data augmentation approach to address the imbalanced data problem in tabular security data sets. Tabular data sets in cybersecurity domains are widely known to pose challenges for machine learning because of their heavily imbalanced data (e.g., a small number of labeled attack samples buried in a sea of mostly benign, unlabeled data). Semi-supervised learning leverages a small subset of labeled data and a large subset of unlabeled data to train a learning model. While semi-supervised methods have been well studied in image and language domains, in security domains they remain underutilized, especially on tabular security data sets which pose especially difficult contextual information loss and balance challenges for machine learning. Experiments applying MCoM to collected security data sets show promise for addressing these challenges, achieving state-of-the-art performance compared with other methods.KeywordsSemi-Supervised LearningContrastive LearningTabular Data SetsSecurity Data Sets

Full Text