Abstract
Big Data generated in exabytes per year has become a watchword of today's research. They are exceptionally afar from the capability of commonly used software tools and also beyond the handling possibility of the single machine architecture. Facing this challenge has activated a requisite to reexamine the data management options. The new avenues of NoSQL Big Data compared to the traditional forms has insisted on adapting experimental beds, helping to discover large unknown values from enormous data sets. Also, outmoded management systems and statistical packages express trouble handling Big Data. In numerous real applications, handling of imbalanced data sets is the fact of precedence. The classification of data sets having imbalanced class distribution has produced a notable drawback in performance obtained by the most standard classifier learning algorithms. Assuming balanced class distribution and equal misclassification costs lead to poor results. In a real-world domain, the classification methods of multi-class imbalance problem need more attention compared to the two-class problem. A methodology is presented for binary/multi-class imbalanced data sets with improved over_sampling (O. S.) techniques to enhance classification. The methods are broadly classified into two categories: non-clustered and cluster based advanced approach compared to prior work on O. S. techniques. The balanced data are subsequently analyzed for classification using various classifiers. Proposed techniques are performed using mapreduce environment on Apache Hadoop, using various data sets from UCI/KEEL repository. Fmeasures and ROC area are used to measure the performance of this classification.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.