Abstract

Objective: Machine learning algorithms are based upon the assumption that data are balanced and so they do not provide good results in imbalanced datasets. This study aimed to explain the methods to be used for fitting a highly accurate model which better classifies the class of interest in imbalanced datasets with the class having a lower number of samples. Material and Methods: The study was planned as a methodological research. There are several weighting methods to calculate the class weight. This study included 4 most frequently used weighting methods. These are inverse of number of samples, inverse of square root of number of samples, effective number of examples and sample based class weight methods. In our study, 4 different class weighting methods were used on random forest and support vector machine, and it was explained how those methods affected class-based performances and the overall performance. Results: In simulated datasets, the best performance was achieved using the using the inverse of square root of number of samples class weighting method both on random forest and support vector machine. In real dataset, the best performance was achieved using the sample based class weight class weighting method on support vector machine. Conclusion: It was seen that all of the class weighting methods used in both machine learning methods were found to increase the performance of the class where recurrence was seen, therefore increasing the overall performance. It has been seen how effective the class weighting method is in dealing with the class imbalance problem.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.