Named Entity Recognition Using BERT with Whole World Masking in Cybersecurity Domain

Shieheng Zhou,Wendian Zhao,Xiaofeng Zhong,Jingju Liu

doi:10.1109/icbda51983.2021.9403180

Abstract

Nowadays the amount of cybersecurity data grows quickly on the Internet, however most of them are textual unstructured data, which is hard for security analysis to understand in time and is not suitable for automated security systems to directly use. The automated and real-time switching of cybersecurity information from unstructured text sources to structured representations can help the cyber threat intelligence analysis know the cyber situation better. Named Entity Recognition (NER) is able to convert unstructured data into structured data. Recently, a language representation model named Bidirectional Encoder Representations from Transformers (BERT) has achieved great improvements among different NLP tasks. In this paper, we apply BERT and its improved version BERT with whole world masking (BERT <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">wwm</sub> ) to the NER task for cybersecurity. We combine the BERT model with the BiLSTM-CRF architecture, and the experiment reveals that our method achieves greater performance on the precision, recall, and F1 score compared with the state-of-the-art model whether on the overall entity or single entity.

Full Text