Abstract

This study was conducted to identify the spatiotemporal torrential rainfall patterns of the East Coast of Peninsular Malaysia, as it is the region most affected by the torrential rainfall of the Northeast Monsoon season. Dimension reduction, such as the classical Principal Components Analysis (PCA) coupled with the clustering approach, is often applied to reduce the dimension of the data while simultaneously performing cluster partitions. However, the classical PCA is highly insensitive to outliers, as it assigns equal weights to each set of observations. Hence, applying the classical PCA could affect the cluster partitions of the rainfall patterns. Furthermore, traditional clustering algorithms only allow each element to exclusively belong to one cluster, thus observations within overlapping clusters of the torrential rainfall datasets might not be captured effectively. In this study, a statistical model of torrential rainfall pattern recognition was proposed to alleviate these issues. Here, a Robust PCA (RPCA) based on Tukey’s biweight correlation was introduced and the optimum breakdown point to extract the number of components was identified. A breakdown point of 0.4 at 85% cumulative variance percentage efficiently extracted the number of components to avoid low-frequency variations or insignificant clusters on a spatial scale. Based on the extracted components, the rainfall patterns were further characterized based on cluster solutions attained using Fuzzy C-means clustering (FCM) to allow data elements to belong to more than one cluster, as the rainfall data structure permits this. Lastly, data generated using a Monte Carlo simulation were used to evaluate the performance of the proposed statistical modeling. It was found that the proposed RPCA-FCM performed better using RPCA-FCM compared to the classical PCA coupled with FCM in identifying the torrential rainfall patterns of Peninsular Malaysia’s East Coast.

Highlights

  • Rainfall is undoubtedly one of the most significant natural phenomena that plays a key role in the natural life and habitat of the earth

  • It can be seen that Robust PCA (RPCA)-Fuzzy C-means clustering (FCM) obtained the largest value of fuzzy silhouette index (FSI), partition coefficient (PC), and modified partition coefficient (MPC) with the lowest value of partition entropy (PE), while the combination of classical Principal Components Analysis (PCA) with FCM obtained poor results. This result shows that the proposed statistical modeling, RPCA-FCM, performed well in clustering the torrential rainfall patterns of East Coast of Peninsular Malaysia compared to the classical procedure

  • A Robust PCA using Tukey’s biweight correlation, combined with the FCM known as RPCA-FCM was introduced

Read more

Summary

Introduction

Rainfall is undoubtedly one of the most significant natural phenomena that plays a key role in the natural life and habitat of the earth. The hardest-hit areas were again along the East Coast of Peninsular Malaysia in the states of Kelantan, Terengganu, and Pahang In hydrological studies, these events are known as torrential rainfall. Previous studies [9,10,11] used regression-based modeling that generally aims to characterize the rainfall distribution patterns With this approach, identifying spatial and temporal patterns of rainfall focuses on detecting trends rather than describing the regional characteristics of each pattern. Studies related to cluster-based approaches in identifying spatial and temporal rainfall patterns aim to quantify the characteristics of a set of observations that fit into the same group, meaning that the patterns are highly structured [12]. This study tested the performance of the biweight correlation under various point changes in order to determine the number of components to extract from the PCA in order to identify the pattern of torrential rainfall.

K-Means
Monte Carlo Simulation
Evaluating Performance of Classical PCA against RPCA
Validity Measures of RPCA-FCM
Methods
Evaluating the Performance of Proposed Models Based on Simulation Results
Description of Clustering of Rainfall Patterns
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call