COPS: An improved information retrieval-based bug localization technique using context-aware program simplification

Yilin Yang,Ziyuan Wang,Zhenyu Chen,Baowen Xu

doi:10.1016/j.jss.2023.111868

Abstract

Information Retrieval Based Bug Localization (IRBL) techniques are well suited for large-scale software debugging with fewer external dependencies and lower execution costs. However, existing IRBL techniques have several challenges, including localization granularity and applicability. First, existing IRBL techniques have not yet achieved statement-level bug localization. Second, almost all studies are limited to Java-based projects, while their effectiveness for other popular programming languages (e.g., Python) is unknown. The reason for these deficiencies is that existing IRBL techniques mainly rely on conventional NLP techniques to analyze the bug reports and have not yet fully utilized the stack traces attached to the bug reports. To improve the IRBL technique, we propose a context-aware program simplification technique – COPS – that can localize defective statements in suspicious files by analyzing the stack traces in bug reports, enabling statement-level bug localization for Python-based projects. Our experiment is based on 948 bug reports, and the results show that COPS can effectively localize buggy statements. First, compared to the original stack traces, Top@10 is improved by 102.6%, MAP@10 by 56.2%, and MRR@10 by 95.6%. We found that actual buggy code entities are more likely to appear in the first five frames of the stack trace. Second, COPS can achieve equally good localization performance compared to state-of-the-art statement-level bug localization techniques and achieve 92% buggy statement coverage with a full-scope search. Finally, experiments found that the stack trace’s first two-thirds of information is more conducive to localizing buggy statements.

Full Text