Abstract

AbstractThe global open source software ecosystem contains rich information in the field of software engineering. The existing analysis methods for the text content of the knowledge community in this field are mainly focus on the structural relationship and rule-based association and mining. This paper proposes a software entity recognition method based on BERT word embedding. Firstly, the BiLSTM-CRF model is constructed, and the entity recognition model is constructed by combining the word vector embedding in software engineering field. Then, the word vector in the input layer of the model is improved by introducing the BERT pre-training language model. In the process of pre-training of BERT, the pre-training data should be constructed based on the discussion content of Stack Overflow software Q & A community. Then, we use these data to pre-training the BERT model, so as to obtain the word vector representation suitable for software engineering field, improving the effect of entity recognition in software engineering field, and solving the problem that the traditional word vector embedding is mostly based on the general domain data training, which is not fully suitable for software engineering field, and can’t well represent the context semantic information. At the same time, to solve the problem that there are few annotated data in the field of software, this paper tries to extends the data appropriately by the method of model prediction and dictionary matching, and carries out experimental test. Finally, this paper uses the method of deep learning to realize the entity recognition in the field of software engineering, so as to provide support for the extraction of software entities, the construction of software knowledge base, and the intelligent application of software engineering.KeywordsEntity recognitionBERT modelStack overflow

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call