Abstract

In data-driven big data security analysis, knowledge graph-based multisource heterogeneous threat data organization, association mining, and inference analysis attach increasinginterest in the field of cybersecurity. Although the construction of knowledge graph based on deep learning has achieved great success, the construction of a largescale, high-quality, and domain-specific knowledge graph needs a manual annotation of large corpora, which means it is very difficult. To tackle this problem, we present a straightforward active learning strategy for cybersecurity entity recognition utilizing deep learning technology. BERT pre-trained model and residual dilation convolutional neural networks (RDCNN) are introduced to learn entity context features, and the conditional random field (CRF) layer is employed as a tag decoder. Then, taking advantages of the output results and distribution of cybersecurity entities, we propose an active learning strategy named TPCL that considers the uncertainty, confidence, and diversity. We evaluated TPCL on the general domain datasets and cybersecurity datasets, respectively. The experimental results show that TPCL performs better than the traditional strategies in terms of accuracy and F1. Moreover, compared with the general field, it has better performance in the cybersecurity field and is more suitable for the Chinese entity recognition task in this field.

Highlights

  • In the increasingly complex situation of the cyberspace security situation, threat intelligence-driven cybersecurity defense has become the focus of the industry [1]

  • We Wireless Communications and Mobile Computing proposed a method (BERTRDCNN-conditional random field (CRF)) in our previous work [5] which is based on a residual dilation convolutional neural network and the BERT model [6]

  • Contrary to the work of uncertainty or excessive importance emphasized by Claveau and Kijak [8], we improve the three standards of information by integrating Shen et al.'s method.we propose an active learning strategy which considers uncertainty, confidence, and diversity, called TPCL, which considers the output results and the distribution of cybersecurity entity-based lexicon

Read more

Summary

Introduction

In the increasingly complex situation of the cyberspace security situation, threat intelligence-driven cybersecurity defense has become the focus of the industry [1]. Li et al presented a cybersecurity named entity recognition neural network model based on self-attention [15]; this model considers that single-word features are not enough to identify entities, and used CNN to extract character features, connects character features to word features; the self-attention mechanism is added on the basis of the BiLSTM-CRF model. Since the Chinese will make mistakes in the process of word segmentation, these mistakes will be propagated backward during the model training process, that is, error propagation

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call