Abstract
Background: Many statistical and machine learning techniques have been implemented to build predictive fault models. Traditional methods are based on supervised learning. Software metrics for a module and corresponding fault information, available from previous projects, are used to train a fault prediction model. This approach calls for a large size of training data set and enables the development of effective fault prediction models. In practice, data collection costs, the lack of data from earlier projects or product versions may make large fault prediction training data set unattainable. Small size of the training set that may be available from the current project is known to deteriorate the performance of the fault predictive model. In semi-supervised learning approaches, software modules with known or unknown fault content can be used for training.Aims: To implement and evaluate a semi-supervised learning approach in software fault prediction.Methods: We investigate an iterative semi-supervised approach to software quality prediction in which a base supervised learner is used within a semi-supervised application.Results: We varied the size of labeled software modules from 2% to 50% of all the modules in the project. After tracking the performance of each iteration in the semi-supervised algorithm, we observe that semi-supervised learning improves fault prediction if the number of initially labeled software modules exceeds 5%.Conclusion: The semi-supervised approach outperforms the corresponding supervised learning approach when both use random forest as base classification algorithm.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.