Background: Code-line-level bugginess identification (CLBI) is an important area within software quality assurance, aiming to pinpoint potential buggy source code lines in a given software product. Recently, two concurrent approaches, GLANCE and DeepLineDP, have showcased impressive performance by respectively leveraging syntactic and semantic features compared with existing state-of-the-art (SOTA) approaches in this field. Problem: Yet, the literature lacks a thorough investigation that fuses these two types of features to enhance CLBI. Such fusion holds the promise of significantly improving the efficacy of identifying defective lines. Objective: We aim to advance CLBI by fusing syntactic and semantic features, thereby harnessing their respective strengths. Method: We propose to build a CLBI approach, SPLICE (boo S ting dee P Linedp w I th synta C tic f E atures), by fusing syntactic features from GLANCE and semantic features from DeepLineDP. SPLICE comprises three variants—SPLICE-S, SPLICE-G, and SPLICE-F—each utilizing a unique line-level sorting approach. We make a comprehensive comparison with existing SOTA approaches using six performance metrics. Result: Through an analysis of nine open-source projects, our experimental results reveal that SPLICE is competitive with current SOTA CLBI approaches. Notably, SPLICE-F demonstrates superiority over all SOTA CLBI approaches, including GLANCE and DeepLineDP, across all six metrics, indicating a substantial improvement. Conclusion: This discovery underscores the critical importance of future CLBI research in fusing syntactic and semantic features to construct more effective bugginess identification approaches. It is worth noting that the analysis was conducted within the context of Java programs, which highlights the potential for exploring similar methods in other programming languages in future research.
Read full abstract