Abstract

Automatic identification of duplicate bug reports is a critical research problem in the software repositories’ mining area. The aim of this paper is to propose and compare amalgamated models for detecting duplicate bug reports using textual and non-textual information of bug reports. The algorithmic models viz. LDA, TF-IDF, GloVe, Word2Vec, and their amalgamation are used to rank bug reports according to their similarity with each other. The amalgamated score is generated by aggregating the ranks generated by models. The empirical evaluation has been performed on the open datasets from large open source software projects. The metrics used for evaluation are mean average precision (MAP), mean reciprocal rank (MRR) and recall rate. The experimental results show that amalgamated model (TF-IDF + Word2Vec + LDA) outperforms other amalgamated models for duplicate bug recommendations. It is also concluded that amalgamation of Word2Vec with TF-IDF models works better than TF-IDF with GloVe. The future scope of current work is to develop a python package that allows the user to select the individual models and their amalgamation with other models on a given dataset.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.