Normalized EM algorithm for tumor clustering using gene expression data

Nguyen Minh Phuong,Nguyen Xuan Vinh

doi:10.1109/bibe.2008.4696683

Abstract

Most of the proposed clustering approaches are heuristic in nature. As a result, it is difficult to interpret the obtained clustering outcomes from a statistical standpoint. Mixture model-based clustering has received much attention from the gene expression community due to its sound statistical background and its flexibility in data modeling. However, current clustering algorithms following the model-based framework suffer from two serious drawbacks. First, the performance of these algorithms critically depends on the starting values for their iterative clustering procedures. And second, they are not capable of working directly with very high dimensional data sets whose dimension might be up to thousands. We propose a novel normalized Expectation-Maximization (EM) algorithm to tackle the two challenges. The normalized EM is stable even with random initializations for its EM iterative procedure. Its stability is demonstrated through the performance comparison with other related clustering algorithms such as the unnormalized EM (The conventional EM algorithm for Gaussian mixture model-based clustering) and spherical k-means. Furthermore, the normalized EM is the first mixture model-based clustering algorithm that is shown to be stable when working directly with very high dimensional microarray data sets in the sample clustering problem, where the number of genes is much larger than the number of samples. Besides, an interesting property of the convergence speed of the normalized EM with respect to the squared radius of the hypersphere in its corresponding statistical model is uncovered.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Normalized EM algorithm for tumor clustering using gene expression data

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Multivariate Procedure for Variable Selection and Classification of High Dimensional Heterogeneous Data
Tahir Mehmood ... Zahid Rasheed
Communications for Statistical Applications and Methods | VOL. 22
Tahir Mehmood, et. al.Tahir Mehmood ... Zahid Rasheed
30 Nov 2015
Communications for Statistical Applications and Methods | VOL. 22

Occam's razor in dimension reduction: Using reduced row Echelon form for finding linear independent features in high dimensional microarray datasets
Mohammad Kazem Ebrahimpour ... Gholamreza Aghamolaei
Engineering Applications of Artificial Intelligence | VOL. 62
Mohammad Kazem Ebrahimpour, et. al.Mohammad Kazem Ebrahimpour ... Gholamreza Aghamolaei
22 Apr 2017
Engineering Applications of Artificial Intelligence | VOL. 62

Subspace Clustering of High Dimensional Spatial Data with Noises
Chih-Ming Hsu ... Ming-Syan Chen
-
Chih-Ming Hsu, et. al.Chih-Ming Hsu ... Ming-Syan Chen
01 Jan 2004
01 Jan 2004

Features Selection in Statistical Classification of High Dimensional Image Derived Maize (<i>Zea Mays</i> L.) Phenomic Data
Peter Gachoki ... Moses Muraya
American Journal of Applied Mathematics and Statistics | VOL. 10
Peter Gachoki, et. al.Peter Gachoki ... Moses Muraya
07 Jun 2022
American Journal of Applied Mathematics and Statistics | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Normalized EM algorithm for tumor clustering using gene expression data

Abstract

Talk to us

Similar Papers