Performance evaluation of information retrieval models in bug localization on the method level

Mai Alduailij,Mona Al-Duailej

doi:10.1109/cts.2015.7210439

Abstract

This study uses statistical inference to compare the performance of three text models used for bug localization in collaboration systems: Vector Space Model (VSM), Latent Semantic Indexing (LSI), and Latent Dirichlet Analysis (LDA) on the method level. After the three models are compared we confirm that VSM is the superior model. We then, point out which external factors i.e. methods lengths, queries lengths, methods documentation comments, products names and components names mentioned in bug reports affect VSM performance. We conclude that VSM performance is positively correlated with most of the tested factors. We believe our results can be helpful to: (i) text models developers, to understand the strengths and limitations of VSM for future development; (ii) bug localization programmers using classical VSM, to understand improved ways to prepare methods extracted from big data collaboration systems and (iii) bug reporters, to follow the most efficient methods presented in this work in reporting bugs to enhance the information retrieval process.

Full Text