Novel Study of Machine Learning Algorithms with Supervised Learning Approach to Solve Real Time Big Data Applications Using Hadoop and Spark

Gopi Krishna T,Endalew Alamir

doi:10.2139/ssrn.3882668

Gopi Krishna T, Endalew Alamir

https://doi.org/10.2139/ssrn.3882668

Copy DOI

Export

Save

Cite

Journal: SSRN Electronic Journal

Publication Date: Jan 1, 2021

Abstract
Full-Text
Similar Papers

Abstract

Listen

This review focuses on different machine learning algorithms, descriptions, pros and cons of the algorithms. It also includes task of machine learning and supervised learning process, tools, techniques and programming language to build machine learning model. Machine learning is a building computational art craft, which learns through time from experience. Machine learning is learning from data and it turns input data to information (mapping input value to output value using label data). The objective of machine learning is building the algorithm that can learn from past data without the help of human experts. The learning algorithm contains task T, performance measure P and training experience T. Machine learning algorithm categorized as supervised, unsupervised and reinforcement learning. Supervised machine learning has two tasks, which are classification and regression. Hadoop with Spark and Python programing language most commonly used to build machine learning model. Information gain and Gain ratio, Gini index and Random forest algorithm used to measure feature importance. From this study most of supervised learning algorithms used k-fold cross validation techniques to split the data set in to training set and testing set. Standard value of k will be five or ten but it didn’t fixed, because it depends on the size of data set. The machine learning model performance mainly evaluated using classification accuracy, precision, recall, area under the curve and F-measure (F-score). Generally machine learning algorithms depend on the nature of the data, because the performance of the learning algorithms affected by data set. Therefore it is impossible to say the prediction accuracy of one algorithm is best over others. As each machine learning algorithms have pros and cons, designing of hybrid algorithms might be overcome this problem.

Full Text