Abstract
Detecting cybersecurity events is necessary to keep us informed about the fast growing number of such events reported in text. In this work, we focus on the task of event detection (ED) to identify event trigger words for the cybersecurity domain. In particular, to facilitate the future research, we introduce a new dataset for this problem, characterizing the manual annotation for 30 important cybersecurity event types and a large dataset size to develop deep learning models. Comparing to the prior datasets for this task, our dataset involves more event types and supports the modeling of document-level information to improve the performance. We perform extensive evaluation with the current state-of-the-art methods for ED on the proposed dataset. Our experiments reveal the challenges of cybersecurity ED and present many research opportunities in this area for the future work.
Highlights
With the proliferation of cyber technologies in our daily life, the frequency of cyberattacks and cybercrimes is rapidly increasing, potentially imposing serious threats to our cyber activities
We examine Information Extraction technologies (IE) in Natural Language Processing
CASIE only contains a small number of event types that fail to cover a wide range of important cyber attack/vulnerability types in reality (Simmons et al, 2014)
Summary
With the proliferation of cyber technologies (i.e., social networks, Internet of Things) in our daily life, the frequency of cyberattacks and cybercrimes is rapidly increasing, potentially imposing serious threats to our cyber activities. The events in the general domain might involve substantial differences with those in the cybersecurity domain (i.e., the divergences in lexical forms, sentence structures and domain expertise), necessitating the development of cybersecurity-focused datasets to aid the research on ED and reveal the nature for the events in this domain To this end, (Satyapanich et al, 2020) recently presents the first dataset for cybersecurity ED (called CASIE) that annotates event instances with rich annotation. CASIE only contains a small number of event types (i.e., five types) that fail to cover a wide range of important cyber attack/vulnerability types in reality (Simmons et al, 2014) This would limit the application of the systems and restrict the comprehensiveness of the analysis about cybersecurity events developed from the dataset. We will publicly release CySecED to promote the future research on ED and NLP for the cybersecurity domain
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.