Abstract

Most of the real-world data sets exhibit a skewed scenario of data distribution in contrast to the well-established data sets. The total number of instances of a particular class extremely surpasses the count of other classes. This uneven dispersal of classes leads to a state of imbalance data sets posing an extreme difficulty for learning procedures. Additionally, due to its intrinsic complex data features, analyzing such imbalanced data sets has setup an avenue for focused researchers. Imbalanced class distribution is effectively handled with over sampling of minority class data which is usually independent of the classifiers. A over sampling technique: Clustering minority samples over sampling technique (CMSOT) is proposed to enhance the classification of imbalanced data sets. The projected technique is implemented on Apache Hadoop under mapreduce environment. The data sets are mainly encompassed from the UCI repository. The effect of True Positive rates justifying the imbalance ratio including the examination of improved classification from the generated pool is studied. The achieved experimental results along with its corresponding statistical analysis of over sampled data sets clearly mark the supremacy of the planned technique to the selected benchmarking techniques.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.