Abstract

Cybersecurity threats continue to increase and are impacting almost all aspects of modern life. Being aware of how vulnerabilities and their exploits are changing gives helpful insights into combating new threats. Applying dynamic topic modeling to a time-stamped cybersecurity document collection shows how the significance and details of concepts found in them are evolving. We correlate two different temporal corpora, one with reports about specific exploits and the other with research-oriented papers on cybersecurity vulnerabilities and threats. We represent the documents, concepts, and dynamic topic modeling data in a semantic knowledge graph to support integration, inference, and discovery. A critical insight into discovering knowledge through topic modeling is seeding the knowledge graph with domain concepts to guide the modeling process. We use Wikipedia concepts to provide a basis for performing concept phrase extraction and show how using those phrases improves the quality of the topic models. Researchers can query the resulting knowledge graph to reveal important relations and trends. This work is novel because it uses topics as a bridge to relate documents across corpora over time.

Highlights

  • Cybersecurity is a crucial computing area vital to our society due to the rise in cyberattacks and the damage they can do (Symantec, 2019)

  • Dynamic topic modeling (DTM) (Blei and Lafferty, 2006) provides a means for performing topic modeling over time

  • Dynamic Topic ModelsDynamic topic modeling (DTM) has been used in many applications, including science research (Blei and Lafferty, 2006), software (Hu et al, 2015), finance (Morimoto and Kawasaki, 2017), music (Shalit et al, 2013), and climate change (Sleeman et al, 2016; Sleeman et al, 2017) to understand how particular domains have changed over time

Read more

Summary

INTRODUCTION

Cybersecurity is a crucial computing area vital to our society due to the rise in cyberattacks and the damage they can do (Symantec, 2019). Researchers have used artificial intelligence techniques to extract information from documents such as security bulletins, after-action reports, and descriptions of new software vulnerabilities for many years Most works in this area have used language understanding technology that extracts references to entities, such as malware instances, software products, IP addresses or process names, and relations between them. These data have been helpful for many purposes, they have not addressed temporal aspects of how the cybersecurity landscape has changed over the years. The heavy use of acronyms and multi-word phrases with non-compositional semantics exacerbates the problem To address these issues, we extract common cybersecurity concepts from Wikipedia data and identify phrases and acronyms that refer to them. We put forth this work to show how temporal analysis by means of cross-domain understanding can be applied to cybersecurity and could be used to foster a document-based search tool

BACKGROUND
Dynamic Topic Models
RELATED WORK
APPROACH
Extracting Knowledge From Unstructured Text
Topic Models Over Time
Automatic Knowledge Graph Generation and Use
CYBERSECURITY DATA SETS
EXPERIMENTS AND ANALYSIS
Concept Context Experiment
Contextual Classification Experiments
Dynamic Model Experiment
USING THE KNOWLEDGE GRAPH FOR SEARCH
CONCLUSION AND FUTURE WORK
Findings
DATA AVAILABILITY STATEMENT
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call