Identifying defects through manual software testing is a resource-intensive task in software development. To alleviate this, software defect prediction identifies code segments likely to contain faults using data-driven methods. Traditional techniques rely on static code metrics, which often fail to reflect the deeper syntactic and semantic features of the code. This paper introduces a novel framework that utilizes transformer-based networks with attention mechanisms to predict software defects. The framework encodes input vectors to develop meaningful representations of software modules. A bidirectional transformer encoder is employed to model programming languages, followed by fine-tuning with labeled data to detect defects. The performance of the framework is assessed through experiments across various software projects and compared against baseline techniques. Additionally, statistical hypothesis testing and an ablation study are performed to assess the impact of different parameter choices. The empirical findings indicate that the proposed approach can increase classification accuracy by an average of 15.93% and improve the F1 score by up to 44.26% compared to traditional methods.
Read full abstract