Abstract

Data preprocessing remains an important step in machine learning studies. This is because proper preprocessing of imbalanced data can enable researchers to reduce defects as much as possible, which, in turn, may lead to the elimination of defects in existing data sets. Despite the remarkable achievements that have been accomplished in machine learning studies, systematic literature reviews of imbalanced data preprocessing techniques are lacking. Consequently, there are a limited number of systematic literature review studies on imbalanced data preprocessing. In this study, the authors assess the existing literature to identify the key issues related to data quality and handling and to provide a convenient collection of the techniques used to address these issues when performing data preprocessing. They applied a systematic literature review method involving a manual search to select articles published from January 2010 to September 2018 for review. The qualities of the existing studies were assessed using certain quality assessment criteria. Of the 118 relevant studies found, only 2% were identified as having been conducted following systematic literature review guidelines. This study, therefore, calls for more systematic literature review studies on data preprocessing to improve the quality of the data applied in machine learning studies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.