Abstract

This paper provides a new dataset for Named Entity Recognition (NER) missions in cyber threat intelligence (CTI) studying. To the best of our knowledge, the proposed dataset is the biggest and challenging one in the field to comply with the STIX 2.1 specification. We collected the APT (Advanced Persistent Threats) reports from different network security companies and manually annotated them. Then we constructed a dataset named APTNER, which can be used for NER joint and multi-task learning tasks in CTI. Apart from common labels like IP, URL, mal-ware, location and so on, APTNER contains 21 categories, which make APTNER more challenging than other NER datasets in CTI field and we have proved the rationality of the dataset. For ease of comparison studies, we realize several state-of-the-art baselines and report their analysis. To facilitate future work on fine-grained NER for CTI, we make APTNER public at https://github.com/wangxuren/APTNER.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call