Abstract
In order to help researchers quickly understand the connection between new threat events and previous threat events, threat intelligence document-level relation extraction plays a very important role in threat intelligence text analysis and processing. Because there is no public document-level threat intelligence dataset, we create APTERC-DOC, an APT intelligence entities, relations and coreference dataset. We treat the relation extraction as a multi-classification task. Treating the coreference relation as a kind of predefined relations, we develop a joint learning framework called TIRECO, a model which can simultaneously complete threat intelligence relation extraction and coreference resolution. In order to solve the problem of document-level text being too long to extract feature, we propose the concept of sentence set, which transforms document-level relation extraction into inter-sentence relation extraction. To incorporate relevant information with maximally removing irrelevant content in sentence set, we further apply a novel pruning strategy (SDP-VP-SET) to the input trees considering that verbs are crucial in determining the relation between entities in sentence set. With retaining the shortest path and nodes that are K hops away from the shortest path, we give the edge connected to the verb nodes a weight of w times. Experimental results show that our model not only performs well in the extraction of inter-sentence relations, it is also effective in intra-sentence relations, and the F1 value has increased by 15.694%.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.