Abstract
Software systems are now ubiquitous and are used every day for automation purposes in personal and enterprise applications; they are also essential to many safety-critical and mission-critical systems, e.g., air traffic control systems, autonomous cars, and Supervisory Control And Data Acquisition (SCADA) systems. With the availability of massive storage capabilities, high speed Internet, and the advent of Internet of Things devices, modern software systems are growing in both size and complexity. Maintaining a high quality of such complex systems while manually keeping the error rate at a minimum is a challenge. This paper proposed a heterogeneous defect prediction method considering class extreme imbalance problem in real software datasets. In the first stage, Sampling with the Majority method (SWIM) based on Mahalanobis Distance is used to balance the dataset to reduce the influence of minority samples in defect data. Due to the negative impact of uncorrelated features on the classification algorithm, the second stage uses ensemble learning and joint similarity measurement to select the most relevant and representative features between the source project and the target project. The third phase realizes the transfer learning from the source project to the target project in the Grassmann manifold space. Our experiments, conducted using nine projects of three public domain software defect libraries and compared with four existing advanced methods to verify the effectiveness of the proposed method in this paper. The experimental results indicate that the proposed method is more accurate in terms of Area under curve (AUC).
Highlights
Software defect prediction (SDP) is important to identify defects in the early phases of software development life cycle [1,2]
In order to investigate the performance of the proposed algorithm in this paper, GMOTDP is compared with the existing state-of-the-art defect prediction methods, such as TCA+ [23], Canonical Correlation Analysis (CCA)+ [8], KCAA+ [13], and KSETE [16]
One project was selected as the target project, and the projects in different datasets were used as the source project for heterogeneous prediction
Summary
Software defect prediction (SDP) is important to identify defects in the early phases of software development life cycle [1,2]. This early identification, and thereby removal of software defects, is crucial to yield a cost-effective and good quality software product. It usually focuses on estimating the defect proneness of software modules, and helps software practitioners allocate limited testing resources to those parts which are most likely to contain defects. The prediction model will pay more attention to the non-defect samples, which makes the prediction model more inclined to the non-defect samples, and ignores the cost of error
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.