Abstract

Nonnegative Matrix Factorization (NMF) has proved to be an effective method for unsupervised clustering analysis of gene expression data. By the nonnegativity constraint, NMF provides a decomposition of the data matrix into two matrices that have been used for clustering analysis. However, the decomposition is not unique. This allows different clustering results to be obtained, resulting in different interpretations of the decomposition. To alleviate this problem, some existing methods directly enforce uniqueness to some extent by adding regularization terms in the NMF objective function. Alternatively, various normalization methods have been applied to the factor matrices; however, the effects of the choice of normalization have not been carefully investigated. Here we investigate the performance of NMF for the task of cancer class discovery, under a wide range of normalization choices. After extensive evaluations, we observe that the maximum norm showed the best performance, although the maximum norm has not previously been used for NMF. Matlab codes are freely available from: http://maths.nuigalway.ie/~haixuanyang/pNMF/pNMF.htm.

Highlights

  • Accurate clustering of tumor samples with unknown cancer types or subtypes is of great importance for advances in cancer treatment and for better understanding of biological processes and the mechanisms of cancers

  • We have investigated the effects of different normalization methods over a Basic Nonnegative Matrix Factorization (NMF) (NMF-Brunet) that does not employ normalization methods from [2] on 9 datasets with 13 settings

  • We have provided an interpretation to justify the use of the maximum norm

Read more

Summary

Introduction

Accurate clustering of tumor samples with unknown cancer types or subtypes is of great importance for advances in cancer treatment and for better understanding of biological processes and the mechanisms of cancers. Cancers from within a group discovered by such a procedure may follow significantly different clinical courses and show different responses to therapy [1]. This happens because of lack of internal features revealing intrinsic biological activities. With the advent of genomic technologies, the expression levels of thousands genes can be served as internal features, and it became possible to design such an approach. This is the objective of molecular cancer class discovery [1, 2]. This is to cluster cancer types based on global gene expression data in an unsupervised setting

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call