Abstract

Imbalance data is common in real-world applications like text categorization, face recognition for gender classification, medical diagnosis, fraud detection, oil-spills detection of satellite images. Most of the algorithms in machine learning are focusing on classification of majority class while ignoring or misclassifying minority sample. The minority samples are those that rarely occur but very important. It is commonly agreed that standard classifiers such as neural networks, support vector machines, and C4.5 are heavily biased in recognizing mostly the majority class since they are built to achieve overall accuracy to which the minority class contributes very little. In this study, we demonstrate how the synthetic minority over-sampling technique (SMOTE) can significantly improve the imbalance problem in gender classification from the data-level perspective. Hu’s moment of the face images was generated as the numerical descriptors with different imbalance ratio and classified using a supervised decision tree (J48) algorithm. The results show that prior to preprocessing the data with SMOTE, the minority group was severely misclassified as the majority group. Our claims are confirmed through the application of SMOTE in reducing the imbalance effects before inducing the decision tree.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.