On the Value of Oversampling for Deep Learning in Software Defect Prediction

Rahul Yedida,Tim Menzies

doi:10.1109/tse.2021.3079841

On the Value of Oversampling for Deep Learning in Software Defect Prediction

Rahul Yedida, Tim Menzies

Open Access

https://doi.org/10.1109/tse.2021.3079841

Copy DOI

Journal: IEEE Transactions on Software Engineering	Publication Date: Aug 1, 2022
Citations: 27

Affiliation: North Carolina State University

#Software Defect Prediction Datasets #Defect Data Sets + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

One truism of deep learning is that the automatic feature engineering (seen in the first layers of those networks) excuses data scientists from performing tedious manual feature engineering prior to running DL. For the specific case of deep learning for defect prediction, we show that that truism is false. Specifically, when we pre-process data with a novel oversampling technique called fuzzy sampling, as part of a larger pipeline called GHOST (Goal-oriented Hyper-parameter Optimization for Scalable Training), then we can do significantly better than the prior DL state of the art in 14/20 defect data sets. Our approach yields state-of-the-art results significantly faster deep learners. These results present a cogent case for the use of oversampling prior to applying deep learning on software defect prediction datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: IEEE Transactions on Software Engineering

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.