Abstract

BackgroundNon-negative matrix factorization (NMF) has been shown to be a powerful tool for clustering gene expression data, which are widely used to classify cancers. NMF aims to find two non-negative matrices whose product closely approximates the original matrix. Traditional NMF methods minimize either the l2 norm or the Kullback-Leibler distance between the product of the two matrices and the original matrix. Correntropy was recently shown to be an effective similarity measurement due to its stability to outliers or noise.ResultsWe propose a maximum correntropy criterion (MCC)-based NMF method (NMF-MCC) for gene expression data-based cancer clustering. Instead of minimizing the l2 norm or the Kullback-Leibler distance, NMF-MCC maximizes the correntropy between the product of the two matrices and the original matrix. The optimization problem can be solved by an expectation conditional maximization algorithm.ConclusionsExtensive experiments on six cancer benchmark sets demonstrate that the proposed method is significantly more accurate than the state-of-the-art methods in cancer clustering.

Highlights

  • Non-negative matrix factorization (NMF) has been shown to be a powerful tool for clustering gene expression data, which are widely used to classify cancers

  • There have been a variety of studies on analyzing DNA microarray data for cancer class discovery [3,4,5]

  • Zheng et al [5,7] applied penalized matrix decomposition (PMD) to extract meta-samples from gene expression data, which could captured the inherent structures of samples that belonged to the same class

Read more

Summary

Introduction

Non-negative matrix factorization (NMF) has been shown to be a powerful tool for clustering gene expression data, which are widely used to classify cancers. There have been a variety of studies on analyzing DNA microarray data for cancer class discovery [3,4,5] Such methods are demonstrated to outperform the traditional, morphological appearance-based cancer classification methods. In such studies, different cancer classes are discriminated by their corresponding gene expression profiles [1]. Non-negative matrix factorization (NMF) was recently introduced to analyze gene expression data and this method demonstrated superior performance in terms of both accuracy and stability [6,7,8]. Zheng et al [5,7] applied penalized matrix decomposition (PMD) to extract meta-samples from gene expression data, which could captured the inherent structures of samples that belonged to the same class

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call