Abstract

The huge variety of NoSQL Big Data has tossed a need for new pathways to store, process and analyze it. The quantum of data created is inconceivable along with a mixed breath of unknown veracity and creative visualization. The new trials of frameworks help to find substantial unidentified values from massive data sets. They have added an exceptional dimension to the pre-processing and contextual conversion of the data sets for needful analysis. In addition, handling of ambitious imbalanced data sets has acknowledged an intimation of alarm. Traditional classifiers are unable to discourse the precise need of grouping for such data sets. Over_sampling of the minority classes help to improve the performance. Updated Class Purity Maximization Over_Sampling Technique (UCPMOT) is a rationalized technique proposed to handle imbalanced data sets using exclusive safe-level based synthetic sample creation. It addresses the multi-class problem in alignment to a newly induced method namely lowest versus highest. The projected technique experiments with several data sets from the UCI repository. The underlying bed of mapreduce environment encompasses the distributed processing approach on Apache Hadoop framework. Several classifiers help to authorize the classification results using parameters like F-measure and AUC values. The experimental conclusions quote the dominance of UCPMOT over the benchmarking techniques.

Highlights

  • The huge variety of NoSQL Big Data has tossed a need for new pathways to store, process and analyze it

  • An advanced cluster based technique (UCPMOT) dealing with binary-class/multiclass imbalanced Big Data sets is presented in this paper

  • The Updated Class Purity Maximization Over_Sampling Technique (UCPMOT) works with MEre Mean Minority Over_Sampling Technique (MEMMOT)/Minority Majority Mix mean Over_Sampling Technique (MMMmOT)/NF_N + Nearest Farthest Neighbor_Mid Over_Sampling Technique (MOT)/Clustering Minority Examples Over_Sampling Technique (CMEOT) using synthetic samples creation (SSS) to achieve the improved F-measure and AUC values

Read more

Summary

Introduction

The huge variety of NoSQL Big Data has tossed a need for new pathways to store, process and analyze it. It postulates the category of over_sampling techniques used for balancing the binary/multi-class data sets. Experimental context The objective of the trial work is to validate the efficiency of planned techniques for dealing with the class imbalance problem in Big Data sets.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call