Abstract
The Synthetic Minority Oversampling Technique (SMOTE) method is the baseline for solving unbalanced data problems. The working concept of the SMOTE method is to generate new synthetic data patterns by performing linear interpolation between minority class samples based on k-nearest neighbors. However, the SMOTE method has weaknesses, namely the problem of overgeneralization due to excessive sampling of sample noise and increased overlapping between classes in the decision boundary area, which has the potential for noise data. Based on the weaknesses of the Smote method, the purpose of this research is to conduct a systematic literature review on the Smote method modification approach in solving unbalanced data. This systematic literature review method comprises keyword identification, article search process, determination of selection criteria, and selection results based on criteria. The results of this study showed that the SMOTE modification approach was based on filtering, clustering, and distance modification to reduce the resulting noise data. The filtering approach removed the noise data before SMOTE, positively impacting resolving unbalanced data. Meanwhile, the use of a clustering approach in SMOTE can minimize the overlapping artificial minority data that has noise potential. The most used datasets are Pima 60% and Haberman 50%. The most used performance evaluation on unbalanced data is f1-measure 57%, accuracy 55%, recall 43%, and AUC 27%. The implication of the results of this literature review is to provide opportunities for further research in modifying SMOTE in addressing health data imbalances, especially handling noise and overlapping data. The thoroughness of our literature review should instill confidence in the research community.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: JOIV : International Journal on Informatics Visualization
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.