Classification of Imbalanced Big Data using SMOTE with Rough Random Forest

Tanuja Das, ,Abhinandan Khan,Goutam Saha

doi:10.35940/ijeat.b4096.129219

Abstract

Learning from datasets is an important research topic today. Amongst the various data mining tools available for the purpose, none works satisfactorily in the case of imbalanced data mainly because this type of data gives rise to various minority classes, which may affect the learning process. In addition to the large volume, characteristics of Big Data also include velocity and variety. The Synthetic Minority Oversampling Technique (SMOTE) is a widely used technique to balance imbalanced data. Here, we have focussed on extending this concept to conform to the Big Data environment by combining it with the concepts of rough random forest (RRF). This hybrid approach comprising SMOTE and RRF algorithms for learning from imbalanced datasets has been applied on various benchmark datasets from the KEEL Dataset Repository. The results obtained are satisfactory. The velocity aspect of Big Data has been handled by this method on the dynamic dataset of the stock market. The results obtained have been verified using popular online websites related to stock markets.

Full Text