Abstract

Imbalanced learning is the classification problem where the number of observations of one class, far surpasses the number of observations of another class. Different sampling approaches are proposed for paired and Multi-Class imbalanced classification. Paired Imbalanced classification encompasses two classes: one of them is majority, while the other one is a minority class. Multi-Class imbalanced classification contains more than two classes for classification. Under-sampling technique is the better sampling technique among conventional approaches. However, existing approaches may not work in the Big Data environment, as considering all the features might compromise the performance of the system. In this work, a novel method is presented which takes into account only the essential features, as well as, deals with massive data as in Big Data environment. In the proposed system, Feature Selection Under-Sampling technique is used for resampling the data. Feature selection is the vital step because it not only decreases the dimensionality of data but also helps classifier to run faster, and accuracy can also be improved. Over that, SVM learning classifier is adopted to construct the model and test the data. The proposed system is implemented using MapReduce framework by integrating statistical analytical tool R.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.