Heterogeneous Defect Prediction Based on Transfer Learning to Handle Extreme Imbalance

Kaiyuan Jiang,Aili Wang,Haibin Wu,Yutong Zhang,Yuji Iwahori

doi:10.3390/app10010396

Abstract

Software systems are now ubiquitous and are used every day for automation purposes in personal and enterprise applications; they are also essential to many safety-critical and mission-critical systems, e.g., air traffic control systems, autonomous cars, and Supervisory Control And Data Acquisition (SCADA) systems. With the availability of massive storage capabilities, high speed Internet, and the advent of Internet of Things devices, modern software systems are growing in both size and complexity. Maintaining a high quality of such complex systems while manually keeping the error rate at a minimum is a challenge. This paper proposed a heterogeneous defect prediction method considering class extreme imbalance problem in real software datasets. In the first stage, Sampling with the Majority method (SWIM) based on Mahalanobis Distance is used to balance the dataset to reduce the influence of minority samples in defect data. Due to the negative impact of uncorrelated features on the classification algorithm, the second stage uses ensemble learning and joint similarity measurement to select the most relevant and representative features between the source project and the target project. The third phase realizes the transfer learning from the source project to the target project in the Grassmann manifold space. Our experiments, conducted using nine projects of three public domain software defect libraries and compared with four existing advanced methods to verify the effectiveness of the proposed method in this paper. The experimental results indicate that the proposed method is more accurate in terms of Area under curve (AUC).

Highlights

Software defect prediction (SDP) is important to identify defects in the early phases of software development life cycle [1,2]
In order to investigate the performance of the proposed algorithm in this paper, GMOTDP is compared with the existing state-of-the-art defect prediction methods, such as TCA+ [23], Canonical Correlation Analysis (CCA)+ [8], KCAA+ [13], and KSETE [16]
One project was selected as the target project, and the projects in different datasets were used as the source project for heterogeneous prediction

Summary

Introduction

Software defect prediction (SDP) is important to identify defects in the early phases of software development life cycle [1,2]. This early identification, and thereby removal of software defects, is crucial to yield a cost-effective and good quality software product. It usually focuses on estimating the defect proneness of software modules, and helps software practitioners allocate limited testing resources to those parts which are most likely to contain defects. The prediction model will pay more attention to the non-defect samples, which makes the prediction model more inclined to the non-defect samples, and ignores the cost of error

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Jan 5, 2020
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Heterogeneous Defect Prediction Based on Transfer Learning to Handle Extreme Imbalance

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

An Ensemble Oversampling Model for Class Imbalance Problem in Software Defect Prediction
Shamsul Huda ... Mohamed Abdelrazek
IEEE Access | VOL. 6
Shamsul Huda, et. al.Shamsul Huda ... Mohamed Abdelrazek
01 Jan 2018
IEEE Access | VOL. 6

Integrating Modeling with System Operations
Thomas M Walski
-
Thomas M WalskiThomas M Walski
28 May 2013
28 May 2013

Development of the Intranet-based SCADA (supervisory control and data acquisition system) for power system
Y Ebata ... S Komatsu
-
Y Ebata, et. al.Y Ebata ... S Komatsu
23 Jan 2000
23 Jan 2000

Low-Cost, Open Source IoT-Based SCADA System Design Using Thinger.IO and ESP32 Thing
Lawrence Oriaghe Aghenta ... Mohammad Tariq Iqbal
Electronics | VOL. 8
Lawrence Oriaghe Aghenta, et. al.Lawrence Oriaghe Aghenta ... Mohammad Tariq Iqbal
24 Jul 2019
Electronics | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Heterogeneous Defect Prediction Based on Transfer Learning to Handle Extreme Imbalance

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences