Abstract

The purpose of this study is to assess the effectiveness of various algorithms for big data classification, namely, partial least squares discriminant analysis (PLS-DA), NaiveBayes (NBC) and K-Nearest Neighbor (KNN) based on the Hadoop MapReduce approach. The effectiveness of the approaches is compared to the classification of big data sets of average shot lengths (CSV). It has been shown that in accordance with the data set size, the PLS-DA classification accuracy increases and reaches 82%, and the computation time goes up to 45 seconds. The analysis of various classifiers showed that high accuracy rates for the PLS-DA classifier are ensured by a high percentage of positive and negative cases properly classified, and lower accuracy for KNN and NaiveBayes is justified by a high percentage of false-positive and false-negative indicators. It is concluded that the optimal classifier is the PLS-DA method, which allows one to classify a large amount data with high accuracy in a short time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call