Abstract

Clustering is an essential task in functional data analysis. In this study, we propose a framework for a clustering procedure based on functional rankings or depth. Our methods naturally combine various types of between-cluster variation equally, which caters to various discriminative sources of functional data; for example, they combine raw data with transformed data or various components of multivariate functional data with their covariance. Our methods also enhance the clustering results with a visualization tool that allows intrinsic graphical interpretation. Finally, our methods are model-free and nonparametric and hence are robust to heavy-tailed distribution or potential outliers. The implementation and performance of the proposed methods are illustrated with a simulation study and applied to three real-world applications.

Highlights

  • Cluster analysis is a critical step in exploratory data analysis intended to identify homogeneous subgroups among observations

  • The filtering-based methods involve the approximation of the curves with linear combinations of finite basis functions, such as splines and functional principal components, and the cluster analysis is conducted based on the coefficients or scores of finite dimensions [5–7]

  • We introduce a new class of functional cluster analysis methods based on functional orderings

Read more

Summary

Introduction

Cluster analysis is a critical step in exploratory data analysis intended to identify homogeneous subgroups among observations. By “standardization”, we mean that the marginal empirical distributions are standardized so that they have zero mean and unit variance This approach is used in the simulation study in order to compare the performance of existing methods with the proposed methods. Since the proposed procedure applies functional ordering, such that every part of the function is treated the different sources of variation are combined in an equal manner. For univariate cases, it may combine the raw curves and the derivatives to measure the magnitude and shape variation simultaneously. The proposed method provides a reasonable graphical interpretation of the clustering result It inherits the robustness of functional orderings and can stably recover the clusters when abnormal observations contaminate the data. The proposed methods will be available soon in the R package GET

Dissimilarity matrix
Combined functional ordering
Functional ordering with intrinsic graphical interpretation
Extreme rank length ordering
Global continuous rank ordering
Global area rank ordering
Studentized maximum ordering
Dissimilarity matrix based on the combined ordering
Simulation study
Clustering of insurance penetration
Clustering of population growth data
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.