Abstract

In high dimensional data, Principal Component Analysis (PCA)-based Pearson correlation remains broadly employed to reduce the data dimensions and to improve the effectiveness of the clustering partitions. Besides being prone to sensitivity on non-Gaussian distributed data, in a high dimensional data analysis, this algorithm may influence the partitions of cluster as well as generate exceptionally imbalanced clusters due to its assigned equal weight to each observation pairs. To solve the unbalanced clusters in hydrological study caused by skewed character of the dataset, this study came out with a robust method of PCA in term of the correlation. This study will explain a RPCA to be proposed as an alternative to classical PCA in reducing high dimensional dataset to a lower form as well as obtain balance clustering result. This study improved where RPCA managed to downweigh the far-from-center outliers and develop the cluster partitions. The results for both methods are compared in term of number of components and clusters obtained as well as the clustering validity. Regarding the internal and stability validation criteria, this study focuses on the cluster's quality in order to validate the results of clusters obtained for both methods. From the findings, the amount of clusters had improved significantly by using RPCA compared to classical PCA. This proved that the proposed approach are outliers resistant than classical PCA as the proposed approach made a thorough observation assessment and downweigh the ones which were distant from the data center.

Highlights

  • Hydrological extreme events are situation whereby the hydrological situation is highly extreme such as sudden increase in magnitude and frequency of high-volume rainfall, which likely brings catastrophic damage to society, economy as well as the environment

  • It is due to the result shows that cumulative percentage that extracted more than 75% variation of Robust PCA (RPCA) were obtained much number of components for rainfall dataset

  • Based on the findings of this research, RPCA was well-performed in the clustering method compared to Principal Component Analysis (PCA)-based Pearson correlation

Read more

Summary

Introduction

Hydrological extreme events are situation whereby the hydrological situation is highly extreme such as sudden increase in magnitude and frequency of high-volume rainfall, which likely brings catastrophic damage to society, economy as well as the environment. There are numerous studies on hydrological extreme events using statistical approaches. This is due to the fact that hydrological processes such as extreme events exhibit the non-linearity and non-stationary characteristics. To address this issue, previous research applied various approaches such as frequency-analysis methods [1], [2], stochastic model [3], Covariates based models [4] and many more. Principal Component Analysis (PCA), known for its ability as dimensionality reductions tools, is regularly used as a pre-processing method in subsiding the data set dimensionality comprising diverse interrelated variables while maintaining the most variations possible in Environment and Ecology Research 9(3): 114-118, 2021 the data set [5]. PCA was often used as a guide in the process of clustering the pattern in improving the cluster solutions’ effectiveness and accurateness [6]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call