Abstract

Data classification is one of the most applicable branches of pattern recognition and data mining science. Its wide range of application can be easily seen in everyday life. In the last few years, major changes have been occurred in data classification technology. Since the field of technology application has been increased, the size of information has been increased as well. Data classification has become difficult due to the unlimited size and imbalanced nature of the data. Data classification with imbalanced class distribution has caused a significant defect in the performance of standard classification learning algorithms, which assume that the data class distribution is relatively balanced. This paper presents a simple and effective sampling method based on Fuzzy C-means Clustering (FCM) and SMOTE (Synthetic Minority Oversampling Technique) that prevent noise generation and effectively resolve imbalance between classes. The evaluation of experiments shows that the proposed technique effectively reduces noise production. The obtained results for the accuracy of the proposed method indicate that it has been improved by an average of two percent compared to the base paper in different datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call