Abstract

In broad, three machine learning classification algorithms are used to discover correlations, hidden patterns, and other useful information from different data sets known as big data. Today, Twitter, Facebook, Instagram, and many other social media networks are used to collect the unstructured data. The conversion of unstructured data into structured data or meaningful information is a very tedious task. The different machine learning classification algorithms are used to convert unstructured data into structured data. In this paper, the authors first collect the unstructured research data from a frequently used social media network (i.e., Twitter) by using a Twitter application program interface (API) stream. Secondly, they implement different machine classification algorithms (supervised, unsupervised, and reinforcement) like decision trees (DT), neural networks (NN), support vector machines (SVM), naive Bayes (NB), linear regression (LR), and k-nearest neighbor (K-NN) from the collected research data set. The comparison of different machine learning classification algorithms is concluded.

Highlights

  • In the current digital era, data is growing exponentially

  • The comparison of True Positive Rate (TPR) curve in Figure 5 shows that TPR values 0.3, 0.5, 0.7, 0.9 and 0.95 corresponding to data set size 5000, 10000, 30000, 50000 and 60000 respectively are highest in Support Vector Machine (SVM) classification algorithm among all five classification algorithms

  • 4.2 True Negative Rate (TNR) The comparison of True Negative Rate (TNR) curves in Figure 6 shows that TNR values 0.3, 0.6, 0.9, 0.75, 0.75 and 0.8 corresponding to data set size 5000, 20000, 30000, 40000, 50000 and 60000 respectively are highest in Naive Bayes (NB) classification algorithm among five classification algorithms

Read more

Summary

INTRODUCTION

In the current digital era, data is growing exponentially. The amount of this growing data known as Big Data is the beginning of the human life revolution in many fields. The five main characteristics of Big Data are (i) volume (ii) variety (iii) velocity (iv) veracity and (v) value. 1.1 Classification of Techniques In this paper, we used five different classifications algorithms for big data analysis, namely (i) Decision Trees (DT) (ii) Neural Networks (NN) (iii) Support Vector Machines (SVM) (iv) Naive Bayes (NB), and (v) k-Nearest Neighbor (K-NN) classification algorithms. Bhardwaj et al (2019) explained that the Naive Bayes (NB) classification algorithms are based on Bayes’ Theorem. It is a probabilistic machine learning model which is used for classifying task. The Naive Bayes classification algorithm performance is based on different real-life cases. In real-life scenarios, it is widely disposable as it does not assume data distribution; i.e., it is non-parametric

RELATED WORKS
EXPERIMENTAL SETUP
RESULT
Accuracy
Findings
CONCLUSION AND FUTURE ASPECTS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call