Abstract

There is a large volume of bug data in the bug repository, which contains rich bug information. Existing studies on bug data mining mainly rely on using information retrieval (IR) technology to search relevant historical bug reports. These studies basically treat a bug report as a closed unit, ignoring the semantic and structural information within it. Named-entity recognition (NER) is an important task of information extraction (IE) technology. Based on NER, fine-grained factual information could be comprehensively extracted to further form structured data, which provides a new way to improve the accessibility of bug information. However, bug NER is different from general NER tasks. Bug reports are free-form text, which include a mixed language environment studded with code, abbreviations and software-specific vocabularies. In this paper, we propose a deep neural network approach for bug-specific entity recognition called DBNER using bidirectional long short-term memory (LSTM) with Conditional Random Fields decoding model (CRF). DBNER extracts multiple features from the massive bug data and uses attention mechanism to improve the consistency of entity tags in the bug reports. Experiment results show that the F1-score reaches an average of 91.19%. In addition, in cross-project experiments, the DBNER’s F1-score reaches an average of 84%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call