Performance Analysis of Machine Learning Algorithms for Big Data Classification

Sanjeev Kumar Punia,Rizwan Patan,Thompson Stephan,Ganesh Gopal Deverajan,Manoj Kumar

doi:10.4018/ijehmc.20210701.oa4

Sanjeev Kumar Punia, Rizwan Patan + Show 3 more

Open Access

PDF Available

https://doi.org/10.4018/ijehmc.20210701.oa4

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

In broad, three machine learning classification algorithms are used to discover correlations, hidden patterns, and other useful information from different data sets known as big data. Today, Twitter, Facebook, Instagram, and many other social media networks are used to collect the unstructured data. The conversion of unstructured data into structured data or meaningful information is a very tedious task. The different machine learning classification algorithms are used to convert unstructured data into structured data. In this paper, the authors first collect the unstructured research data from a frequently used social media network (i.e., Twitter) by using a Twitter application program interface (API) stream. Secondly, they implement different machine classification algorithms (supervised, unsupervised, and reinforcement) like decision trees (DT), neural networks (NN), support vector machines (SVM), naive Bayes (NB), linear regression (LR), and k-nearest neighbor (K-NN) from the collected research data set. The comparison of different machine learning classification algorithms is concluded.

Highlights

In the current digital era, data is growing exponentially
The comparison of True Positive Rate (TPR) curve in Figure 5 shows that TPR values 0.3, 0.5, 0.7, 0.9 and 0.95 corresponding to data set size 5000, 10000, 30000, 50000 and 60000 respectively are highest in Support Vector Machine (SVM) classification algorithm among all five classification algorithms
4.2 True Negative Rate (TNR) The comparison of True Negative Rate (TNR) curves in Figure 6 shows that TNR values 0.3, 0.6, 0.9, 0.75, 0.75 and 0.8 corresponding to data set size 5000, 20000, 30000, 40000, 50000 and 60000 respectively are highest in Naive Bayes (NB) classification algorithm among five classification algorithms

Summary

INTRODUCTION

In the current digital era, data is growing exponentially. The amount of this growing data known as Big Data is the beginning of the human life revolution in many fields. The five main characteristics of Big Data are (i) volume (ii) variety (iii) velocity (iv) veracity and (v) value. 1.1 Classification of Techniques In this paper, we used five different classifications algorithms for big data analysis, namely (i) Decision Trees (DT) (ii) Neural Networks (NN) (iii) Support Vector Machines (SVM) (iv) Naive Bayes (NB), and (v) k-Nearest Neighbor (K-NN) classification algorithms. Bhardwaj et al (2019) explained that the Naive Bayes (NB) classification algorithms are based on Bayes’ Theorem. It is a probabilistic machine learning model which is used for classifying task. The Naive Bayes classification algorithm performance is based on different real-life cases. In real-life scenarios, it is widely disposable as it does not assume data distribution; i.e., it is non-parametric

RELATED WORKS

EXPERIMENTAL SETUP

RESULT

Accuracy

Findings

CONCLUSION AND FUTURE ASPECTS

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of E-Health and Medical Communications	Publication Date: Jul 1, 2021
Citations: 49	License type: CC BY 3.0

R Discovery Prime

Performance Analysis of Machine Learning Algorithms for Big Data Classification

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: International Journal of E-Health and Medical Communications

Lead the way for us

Similar Papers

Comparative analysis of machine learning algorithms for sentiment classification in social media text
Israt Jahan ... Md Mahadi Hasan
World Journal of Advanced Research and Reviews | VOL. 23
Israt Jahan, et. al. Israt Jahan ... Md Mahadi Hasan
30 Sep 2024
World Journal of Advanced Research and Reviews | VOL. 23

Natural Language Processing and the Promise of Big Data: Small Step Forward, but Many Miles to Go.
Thomas M Maddox ... Michael A Matheny
Circulation. Cardiovascular quality and outcomes | VOL. 8
Thomas M Maddox, et. al.Thomas M Maddox ... Michael A Matheny
18 Aug 2015
Circulation. Cardiovascular quality and outcomes | VOL. 8

Comparative analysis of machine learning algorithms for land use and land cover classification using google earth engine platform
Abhijit Patil ... Sachin Panhalkar
Journal of Geomatics | VOL. 17
Abhijit Patil, et. al.Abhijit Patil ... Sachin Panhalkar
31 Oct 2023
Journal of Geomatics | VOL. 17

Classication Of Land-Cover Through Machine Learning Algorithms For Fusion Of Sentinel-2a And Planetscope Imagery
Maycol Alejandro Zaraza Aguilera
-
Maycol Alejandro Zaraza AguileraMaycol Alejandro Zaraza Aguilera
01 Mar 2020
01 Mar 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Performance Analysis of Machine Learning Algorithms for Big Data Classification

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: International Journal of E-Health and Medical Communications