Analysis of Dimensionality Reduction Techniques on Big Data

G Thippa Reddy,Dharmendra Singh Rajput,Kuruva Lakshmanna,M Praveen Kumar Reddy,Rajesh Kaluri,Gautam Srivastava,Thar Baker

doi:10.1109/access.2020.2980942

Abstract

Due to digitization, a huge volume of data is being generated across several sectors such as healthcare, production, sales, IoT devices, Web, organizations. Machine learning algorithms are used to uncover patterns among the attributes of this data. Hence, they can be used to make predictions that can be used by medical practitioners and people at managerial level to make executive decisions. Not all the attributes in the datasets generated are important for training the machine learning algorithms. Some attributes might be irrelevant and some might not affect the outcome of the prediction. Ignoring or removing these irrelevant or less important attributes reduces the burden on machine learning algorithms. In this work two of the prominent dimensionality reduction techniques, Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) are investigated on four popular Machine Learning (ML) algorithms, Decision Tree Induction, Support Vector Machine (SVM), Naive Bayes Classifier and Random Forest Classifier using publicly available Cardiotocography (CTG) dataset from University of California and Irvine Machine Learning Repository. The experimentation results prove that PCA outperforms LDA in all the measures. Also, the performance of the classifiers, Decision Tree, Random Forest examined is not affected much by using PCA and LDA.To further analyze the performance of PCA and LDA the eperimentation is carried out on Diabetic Retinopathy (DR) and Intrusion Detection System (IDS) datasets. Experimentation results prove that ML algorithms with PCA produce better results when dimensionality of the datasets is high. When dimensionality of datasets is low it is observed that the ML algorithms without dimensionality reduction yields better results.

Highlights

Machine Learning (ML) is one of the rapid growing technologies in the past 15 years
The main motivation behind this paper is to study the impact of dimensionality reduction techniques on the performance of the ML algorithms
Cardiotocography dataset is considered from UCI machine learning repository, which has 2126 instances and 23 attributes

Summary

INTRODUCTION

Machine Learning (ML) is one of the rapid growing technologies in the past 15 years. It has numerous applicaions in various fields like computer vision, bioinformatics, business analytics, healthcare, banking sector, fraud detection, prediction of trends etc. Feature engineering is an important pre-processing step that helps in extraction of transformed features from the raw data that will simplify the ML model and improves the quality of the results of a machine learning algorithm. Two of the popular dimensionality reduction techniques namely Linear Discernment Analysis (LDA) and Principle Component Analysis (PCA) are investigated on widely used ML algorithms namely Decision Tree, Navie Bayes, Random Forest and Support Vector Machine using publicly available Cardiotocography (CTG) dataset from UCI machine learning repository [9]. In the step the dimensionality reduction techniques, PCA and LDA are applied individually on the CTG dataset that will extract most important attributes. The VOLUME 8, 2020 impact of feature engineering and dimensionality reduction techniques on the performance of ML algorithms is analysed in detail.

LITERATURE REVIEW

RESULTS AND DISCUSSIONS

CONCLUSION AND FUTURE WORK

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 514	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Analysis of Dimensionality Reduction Techniques on Big Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

An effective integrated machine learning approach for detecting diabetic retinopathy
Penikalapati Pragathi ... Agastyaraju Nagaraja Rao
Open Computer Science | VOL. 12
Penikalapati Pragathi, et. al.Penikalapati Pragathi ... Agastyaraju Nagaraja Rao
10 Mar 2022
Open Computer Science | VOL. 12

Nitrate Classification Based on Optical Absorbance Data Using Machine Learning Algorithms for a Hydroponics System.
Rozita Sulaiman ... Nur Hidayah Azeman
Applied Spectroscopy | VOL. 77
Rozita Sulaiman, et. al.Rozita Sulaiman ... Nur Hidayah Azeman
16 Nov 2022
Applied Spectroscopy | VOL. 77

Comparison of Classification Success Rates of Different Machine Learning Algorithms in the Diagnosis of Breast Cancer.
Irem Ozcan ... Ali Cetinkaya
Asian Pacific Journal of Cancer Prevention | VOL. 23
Irem Ozcan, et. al.Irem Ozcan ... Ali Cetinkaya
01 Oct 2022
Asian Pacific Journal of Cancer Prevention | VOL. 23

Comparative Analysis of Supervised Machine Learning Algorithms for Predicting Cardiovascular Disease
Shreyans Jain ... Mr Lakhan Bhaskar Kadel
International Journal of Enhanced Research in Management & Computer Applications | VOL. 13
Shreyans Jain, et. al.Shreyans Jain ... Mr Lakhan Bhaskar Kadel
01 Jan 2024
International Journal of Enhanced Research in Management & Computer Applications | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Analysis of Dimensionality Reduction Techniques on Big Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access