Abstract

The objective of comparing various dimensionality techniques is to reduce feature sets in order to group attributes effectively with less computational processing time and utilization of memory. The various reduction algorithms can decrease the dimensionality of dataset consisting of a huge number of interrelated variables, while retaining the dissimilarity present in the dataset as much as possible. In this paper we use, Standard Deviation, Variance, Principal Component Analysis, Linear Discriminant Analysis, Factor Analysis, Positive Region, Information Entropy and Independent Component Analysis reduction algorithms using Hadoop Distributed File System for massive patient datasets to achieve lossless data reduction and to acquire required knowledge. The experimental results demonstrate that the ICA technique can efficiently operate on massive datasets eliminates irrelevant data without loss of accuracy, reduces storage space for the data and also the computation time compared to other techniques.

Highlights

  • The data is growing day by day in hospitals for the last ten years makes it difficult to store, manage and analyzing it either to make decisions of patients for right treatment

  • We considered various dimensionality reduction techniques such as Standard Deviation (SD), Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Factor Analysis (FA), Positive Region (PR), Information Entropy (IE) and Independent Component Analysis (ICA)

  • To calculate the performance of the dimensionality reduction techniques, we have considered measurements reduction of data size those effects processing speed and utilization of memory

Read more

Summary

Introduction

The data is growing day by day in hospitals for the last ten years makes it difficult to store, manage and analyzing it either to make decisions of patients for right treatment. Dimensionality Reduction is a method to convert the given dataset of with more dimensions into fewer dimensions. In this method important information will not be lost, and redundant features will be eliminated along with unwanted data. Dimensionality reduction is important for making decision of a patient treatment because it leads to identify that set of features which alone shows most variability. Revised Manuscript received on September 30, 2021.

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.