Abstract

In recent years, individuals, organizations and countries are all threatened by cyber threats to some degree. The proposal of threat intelligence sharing scheme has greatly helped the protection of cyber security. Traditional threat intelligence sharing scheme mainly collects and analyzes information manually, which include but not limited to Indicators of Compromise (IOC) and forms a machine readable report for Security Operations Center (SOC) to take corresponding action. Therefore, it is challenging and significant to easily and automatically share and exchange cyber threat intelligence (CTI). Aiming at extracting the information of CTI efficiently, we construct a model of automatic information extraction process of the entity recognition and relationship extraction, which are used to extract effective entities and relationships in threat intelligence reports and improve the efficiency of threat intelligence sharing. The specific content and research results include two aspects: (1) Research on threat intelligence entity recognition model. We use the BERT model as a corpus pre-training model based on the classic neural network BiLSTM-CRF, and proposes a model DT-BERT-BiLSTM-CRF based on the dictionary template. The BERT pre-training model makes full use of the contextual semantic information of the corpus and alleviates the problem of ambiguity in the process of threat intelligence entity recognition. By constructing a dictionary template of threat intelligence entities, the accuracy of entity recognition in the threat intelligence field is further improved. (2) Research on the extraction of ITC relations. We constructed the relation extraction data set with distant supervision methods. For alleviating the noise annotation data, we introduce the attention mechanism and reinforcement learning into traditional neural networks, proposing a model NR-RL-PCNN-ATT. Through a new reward mechanism, our model improves the sentence selection quality and the efficiency of relationship extraction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call