Abstract

The links between issues in an issue-tracking system and commits resolving the issues in a version control system are important for a variety of software engineering tasks (e.g., bug prediction, bug localization and feature location). However, only a small portion of such links are established by manually including issue identifiers in commit logs, leaving a large portion of them lost in the evolution history. To recover issue-commit links, heuristic-based and learning-based techniques leverage the metadata and text/code similarity in issues and commits; however, they fail to capture the embedded semantics in issues and commits and the hidden semantic correlations between issues and commits. As a result, this semantic gap inhibits the accuracy of link recovery.To bridge this gap, we propose a semantically-enhanced link recovery approach, named DeepLink, which is built on top of deep learning techniques. Specifically, we develop a neural network architecture, using word embedding and recurrent neural network, to learn the semantic representation of natural language descriptions and code in issues and commits as well as the semantic correlation between issues and commits. In experiments, to quantify the prevalence of missing issue-commit links, we analyzed 1078 highly-starred GitHub Java projects (i.e., 583,795 closed issues) and found that only 42.2% of issues were linked to corresponding commits. To evaluate the effectiveness of DeepLink, we compared DeepLink with a state-of-the-art link recovery approach FRLink using ten GitHub Java projects and demonstrated that DeepLink can outperform FRLink in terms of F-measure.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call