Abstract

Background: Feature selection techniques are important factors for improving machine learning models because they increase prediction accuracy and decrease the time to create a model. Recently, feature selection techniques have been employed on software quality prediction problems with different results and no clear indication of which techniques are frequently used.Objective: This study aims to conduct a systematic review of the application of feature selection techniques in software quality prediction and answers eight research questions.Method: The review evaluates 15 papers in 9 journals and 6 conference proceedings from 2007 to 2017 using the standard systematic literature review method.Results: The results obtained from this study reveal that the filter feature selection method was the most commonly used in the studies (60%) and RELIEF was the most employed among this method, and a limited number of studies employed an ensemble method. Several studies used public datasets available in the PROMISE software project repository (60%). Most studies focused on software defect prediction (classification problem) using area under curve (AUC) as a primary evaluation measure, whereas only two studies focused on software maintainability prediction (regression problem) using mean magnitude of relative error (MMRE) as a primary evaluation measure. All selected studies performed k-fold cross-validation to evaluate model accuracy. Individual prediction models were mostly employed and ensemble models appeared only in three studies. Naive Bayes was the most investigated among individual models, whereas Random forest was the most investigated among ensemble models.Conclusion: Feature selection techniques used by selected primary studies have a positive impact on the performance of the prediction models. Further, both ensemble feature selection method and ensemble models have the ability for increasing prediction accuracy over single methods or individual models and have reported improvement in the prediction accuracy; however, the application of these techniques in software quality prediction is still limited.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.