Abstract

TCLUST is a method in statistical clustering technique which is based on modification of trimmed k-means clustering algorithm. It is called “crisp” clustering approach because the observation is can be eliminated or assigned to a group. TCLUST strengthen the group assignment by putting constraint to the cluster scatter matrix. The emphasis in this paper is to restrict on the eigenvalues, λ of the scatter matrix. The idea of imposing constraints is to maximize the log-likelihood function of spurious-outlier model. A review of different robust clustering approach is presented as a comparison to TCLUST methods. This paper will discuss the nature of TCLUST algorithm and how to determine the number of cluster or group properly and measure the strength of group assignment. At the end of this paper, R-package on TCLUST implement the types of scatter restriction, making the algorithm to be more flexible for choosing the number of clusters and the trimming proportion.

Highlights

  • CHARACTERISTICS of TCLUSTThe presence of outlying observations is a common problem in most statistical analysis

  • 2.1 TCLUST with other robust methods. Another robust alternative to k-means is Partitioning Around Medoids (PAM)

  • Compared to TCLUST which is based on k-means, PAM did not well handle outlying data well, Fritz et al 2011[2] found that small number of outlying data did not affect the clustering result very much

Read more

Summary

INTRODUCTION

The presence of outlying observations is a common problem in most statistical analysis. Comparing between robust and non robust clustering procedure, non-robust clustering methods failed to accurately analyses even with the existence of small fraction of outlying data (Fritz et al 2011 [1]). For this case, robust clustering method always serves better to cluster correctly in the presence of outliers. TCLUST methods are statistical clustering techniques which are based on the modification of trimmed k-means clustering algorithm. By maximizing the spurious loglikelihood function with constraints on the eigenvalues, H is partitioned according to the number of clusters, k as desire (Garcia-Escudero et al 2010 [3])

TCLUST with other robust methods
TCLUST output
TCLUST and PAM
Simulation study
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.