Abstract

Topic detection technology can automatically discover new topics on the Internet. This paper investigates domain-oriented feature extraction methods, and proposes a keyword feature extraction method ITFIDF-LP, a subject word feature extraction method LDA-SLP and a topic clustering model based on vector product similarity. A novel Domain-oriented Topic Discovery based on Features Extraction and Topic Clustering (DTD-FETC) model is proposed to analyze open source web of a domain and identify emerging topics in the domain in real time. This article describes a DTD-FETC system built for cyber security domain. It filters and aggregates web for specical security threat topics such as vulnerability and malware, and helps security staff respond quickly and defends against the emerging cyber threats as early as possible. The recall rate, accuracy and F1 value results of the DTD-FETC method applied to the cyber security dataset are all above 0.99.

Highlights

  • With the development of the Internet, people have more and more ways to obtain information from the Internet, such as web pages, microblog, Twitter and so on

  • We propose a novel Domain-oriented Topic Discovery based on Features Extraction and Topic Clustering (DTD-FETC) method

  • Based on general topic detection technology, this paper deeply studies feature extraction methods applied to the security field, improves upon existing topic clustering models, and proposes a threat topic discovery method tailored to the security field

Read more

Summary

Introduction

With the development of the Internet, people have more and more ways to obtain information from the Internet, such as web pages, microblog, Twitter and so on. A lot of information related to a topic is scattered in different spaces on the Internet, making it more and more difficult for people to find the multifaceted information about a topic or event. Faced with a large amount of data on the Internet, without efficient tools, it is difficult for decision makers to obtain information about the latest events or topics, so as to make correct decisions. In this case, Topic Detection and Tracking (TDT) technologies have emerged. TDT technology can discover and correlate information about a topic scattered in different places [1]. TDT can be applied in many fields, such as financial analysis, government governance, network security and so on [2]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call