Partition clustering of high dimensional low sample size data based on [formula omitted]-values

George Von Borries,Haiyan Wang

doi:10.1016/j.csda.2009.06.012

Abstract

Clustering techniques play an important role in analyzing high dimensional data that is common in high-throughput screening such as microarray and mass spectrometry data. Effective use of the high dimensionality and some replications can help to increase clustering accuracy and stability. In this article a new partitioning algorithm with a robust distance measure is introduced to cluster variables in high dimensional low sample size (HDLSS) data that contain a large number of independent variables with a small number of replications per variable. The proposed clustering algorithm, PPCLUST, considers data from a mixture distribution and uses p -values from nonparametric rank tests of homogeneous distribution as a measure of similarity to separate the mixture components. PPCLUST is able to efficiently cluster a large number of variables in the presence of very few replications. Inherited from the robustness of rank procedure, the new algorithm is robust to outliers and invariant to monotone transformations of data. Numerical studies and an application to microarray gene expression data for colorectal cancer study are discussed.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Partition clustering of high dimensional low sample size data based on [formula omitted]-values

Abstract

Talk to us

Similar Papers

More From: Computational Statistics & Data Analysis

Lead the way for us

Journal: Computational Statistics & Data Analysis	Publication Date: Jun 26, 2009
Citations: 7

Similar Papers

An effective feature selection method based on pair-wise feature proximity for high dimensional low sample size data
S L Happy ... Aurobinda Routray
-
S L Happy, et. al.S L Happy ... Aurobinda Routray
01 Aug 2017
01 Aug 2017

High dimensional low sample size activity recognition using geometric classifiers
Muhammad Shahzad Cheema ... Christian Bauckhage
Digital Signal Processing | VOL. 42
Muhammad Shahzad Cheema, et. al.Muhammad Shahzad Cheema ... Christian Bauckhage
22 Apr 2015
Digital Signal Processing | VOL. 42

An Empirical Study of Several Information Theoretic Based Feature Extraction Methods for Classifying High Dimensional Low Sample Size Data
Sheena Leeza Verghese ... Tomas H Maul
IEEE Access | VOL. 9
Sheena Leeza Verghese, et. al.Sheena Leeza Verghese ... Tomas H Maul
01 Jan 2020
IEEE Access | VOL. 9

Survival analysis for HDLSS data with time dependent variables: Lessons from predictive maintenance at a mining service provider
Axel Hochstein ... Hyung-Il Ahn
-
Axel Hochstein, et. al.Axel Hochstein ... Hyung-Il Ahn
01 Jul 2013
01 Jul 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Partition clustering of high dimensional low sample size data based on [formula omitted]-values

Abstract

Talk to us

Similar Papers

More From: Computational Statistics &amp; Data Analysis

More From: Computational Statistics & Data Analysis