Abstract

Information retrieval-based bug localization (IRBL) aims at finding buggy files using a bug report as a query. IRBL performance is highly dependent on the query quality. To improve the query quality for IRBL, automatic query expansion (AQE) method has been proposed for identifying query-related terms from the first-retrieved source files. This approach inevitably depends on two determinant of post- retrieval results, the retrieval model and the initial query quality. We propose a novel word embedding-based AQE technique, WEQE, to avoid the heavy dependency of the current AQE approach. Word embedding model enables to fetch terms semantically related to a query by representing words in a vector space. Our method embeds the words from both the global corpus and project-specific-corpus. The initial query is extended by adding words semantically similar to it based on vector representations from our embedding model. We validated the effectiveness of WEQE by using 4,583 bug reports from seven projects, four IRBL models, and two em-bedding models. Our large-scale experimental results show that WEQE can improve the average precision for bug localization for at least 42% of all queries. Our expanded queries on the best IRBL model achieve a 6% higher mean average precision for bug localization than the initial query.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call