Software Knowledge Entity Relation Extraction with Entity-Aware and Syntactic Dependency Structure Information

Mingjing Tang,Yahui Tang,Zifei Ma,Wei Wang,Tong Li,Rui Zhu

doi:10.1155/2021/7466114

Mingjing Tang, Yahui Tang + Show 4 more

Open Access

https://doi.org/10.1155/2021/7466114

Copy DOI

Abstract

Software knowledge community contains a large scale of software knowledge entities with complex structure and rich semantic relations. Semantic relation extraction of software knowledge entities is a critical task for software knowledge graph construction, which has an important impact on knowledge graph based tasks such as software document generation and software expert recommendation. Due to the problems of entity sparsity, relation ambiguity, and the lack of annotated dataset in user-generated content of software knowledge community, it is difficult to apply existing methods of relation extraction in the software knowledge domain. To address these issues, we propose a novel software knowledge entity relation extraction model which incorporates entity-aware information with syntactic dependency information. Bidirectional Gated Recurrent Unit (Bi-GRU) and Graph Convolutional Networks (GCN) are used to learn the features of contextual semantic representation and syntactic dependency representation, respectively. To obtain more syntactic dependency information, a weight graph convolutional network based on Newton’s cooling law is constructed by calculating a weight adjacency matrix. Specifically, an entity-aware attention mechanism is proposed to integrate the entity information and syntactic dependency information to improve the prediction performance of the model. Experiments are conducted on a dataset which is constructed based on texts of the StackOverflow and show that the proposed model has better performance than the benchmark models.

Highlights

As a successful software knowledge community, StackOverflow provides a platform for software developers to exchange and share knowledge about software programming, configuration management, and project organization and gradually develops into an important knowledge base in the software field [1]. e social text of StackOverflow contains a large scale of specific software knowledge entities with complex structure and rich semantic relations
Compared with Q&A text, tagWiki is a text with good text standardization and domain knowledge integrity, which used to describe the definitions of various tags and related resources in StackOverflow. erefore, we construct the annotated dataset based on the Q&A text and tagWiki text of StackOverflow for software knowledge entity relation extraction. e detailed construction process is as follows
Based on the analysis of the syntactic dependency structure, we introduce Graph Convolutional Networks (GCN) model to model the syntactic dependency structure information of sentence sequence and assign different weights to the adjacency matrix according to the distance between nodes, so as to realize the enhanced representations of syntactic dependency between nodes. erefore, based on Bidirectional Gated Recurrent Unit (Bi-GRU) model, we compare the performance of software knowledge entity relation extraction with GCN model and the weighted GCN model. e experimental results are shown in Table 4 and Figure 4

Summary

Introduction

As a successful software knowledge community, StackOverflow provides a platform for software developers to exchange and share knowledge about software programming, configuration management, and project organization and gradually develops into an important knowledge base in the software field [1]. e social text of StackOverflow contains a large scale of specific software knowledge entities with complex structure and rich semantic relations. E machine learning-based relation extraction method utilizes feature engineering and annotated data to achieve better performance, which effectively alleviates the dependence on linguistics and domain knowledge, and has strong domain migration ability. Zhao et al [18] proposed a relation triplets extraction framework in the software engineering field by incorporating dependency parser with rulebased methods In this framework, Support Vector Machine (SVM) is used as a classifier to evaluate the domain correlation of candidate relation triples, and a software knowledge graph covering 35,279 relation triples, 44,800 concepts, and 9660 verb phrases is constructed by combining text features, corpus features, concept features, and source features. Compared with financial investment, science education, biomedicine, and other fields, corresponding publicly annotated dataset and proper models for software engineering field are not available

The Proposed Method

Experiment Results and Analysis

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Programming	Publication Date: Dec 22, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Software Knowledge Entity Relation Extraction with Entity-Aware and Syntactic Dependency Structure Information

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Programming

Lead the way for us

Similar Papers

Extracting drug-drug interactions with hybrid bidirectional gated recurrent unit and graph convolutional network.
Di Zhao ... Yijia Zhang
Journal of Biomedical Informatics | VOL. 99
Di Zhao, et. al.Di Zhao ... Yijia Zhang
27 Sep 2019
Journal of Biomedical Informatics | VOL. 99

Integrating graph convolutional networks to enhance prompt learning for biomedical relation extraction
Bocheng Guo ... Hongfei Lin
Journal of Biomedical Informatics | VOL. 157
Bocheng Guo, et. al.Bocheng Guo ... Hongfei Lin
28 Aug 2024
Journal of Biomedical Informatics | VOL. 157

Distant-Supervised Relation Extraction with Hierarchical Attention Based on Knowledge Graph
Hong Yao ... Shiqi Zhen
-
Hong Yao, et. al.Hong Yao ... Shiqi Zhen
01 Nov 2019
01 Nov 2019

BiodiViz: Leveraging NER and RE for Automated Knowledge Graph Generation in Biodiversity Research
Angela Shannen Tan ... Roselyn Gabud
Biodiversity Information Science and Standards | VOL. 8
Angela Shannen Tan, et. al.Angela Shannen Tan ... Roselyn Gabud
29 Oct 2024
Biodiversity Information Science and Standards | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Software Knowledge Entity Relation Extraction with Entity-Aware and Syntactic Dependency Structure Information

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Programming