Automated defect identification via path analysis-based features with transfer learning

Yuwei Zhang,Dahai Jin,Ying Xing,Yunzhan Gong

doi:10.1016/j.jss.2020.110585

Abstract

Recently, artificial intelligence techniques have been widely applied to address various specialized tasks in software engineering, such as code generation, defect identification, and bug repair. Despite the diffuse usage of static analysis tools in automatically detecting potential software defects, developers consider the large number of reported alarms and the expensive cost of manual inspection to be a key barrier to using them in practice. To automate the process of defect identification, researchers utilize machine learning algorithms with a set of hand-engineered features to build classification models for identifying alarms as actionable or unactionable. However, traditional features often fail to represent the deep syntactic structure of alarms. To bridge the gap between programs’ syntactic structure and defect identification features, this paper first extracts a set of novel fine-grained features at variable-level, called path-variable characteristic, by applying path analysis techniques in the feature extraction process. We then raise a two-stage transfer learning approach based on our proposed features, called feature ranking-matching based transfer learning, to increase the performance of cross-project defect identification. Our experimental results for eight open-source projects show that the proposed features at variable-level are promising and can yield significant improvement on both within-project and cross-project defect identification.

Full Text