Abstract

Clustering gene expression data is an important research topic in bioinformatics because knowing which genes act similarly can lead to the discovery of important biological information. Many clustering algorithms have been used in the field of gene clustering. The multivariate Gaussian mixture distribution function was frequently used as the component of the finite mixture model for clustering, however the clustering cannot be restricted to the normal distribution in the real dataset. In order to make the cluster algorithm strong adaptability, this paper proposes a new scheme for clustering gene expression data based on the multivariate elliptical contoured mixture models (MECMMs). To solve the problem of over-reliance on the initialization, we propose an improved expectation maximization (EM) algorithm by adding and deleting initial value for the classical EM algorithm, and the number of clusters can be treated as a known parameter and inferred with the QAIC criterion. The improved EM algorithm based on the MECMMs is tested and compared with some other clustering algorithms, the performance of our clustering algorithm has been extensively compared over several simulated and real gene expression datasets. Our results indicated that improved EM clustering algorithm is superior to the classical EM algorithm and the support vector machines (SVMs) algorithm, and can be widely used for gene clustering.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.