Abstract

Reproducibility of network intrusion detection research necessitates widely available datasets that represent real-world scenarios. One of the key omissions of existing datasets used in empirical evaluations of network intrusions is the lack of human-generated traffic with accurate labels to distinguish benign and malicious behavior. Using an emulated network environment with a vulnerable web application, we collected baseline traffic, human-generated normal user traffic, automated attacks, and the attacks of ten human penetration testers of varying abilities. We preprocessed this collected data to produce a new dataset named the Colorado University Pentesting Intrusion Dataset (CUPID). The attacks span from reconnaissance activities to delivery of an exploit payload. To our knowledge, this is the first collection that provides labeled, Institutional Review Board-approved, benign and attacker data that is publicly available. The CUPID dataset can be used to train and test the limits of classification-based machine learning algorithms used for network intrusion detection systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call