Abstract

Determining the number of clusters in high-dimensional real-life datasets and interpreting the final outcome are among the challenging problems in data science. Discovering the number of classes in cancer and microarray data plays a vital role in the treatment and diagnosis of cancers and other related diseases. Nonnegative matrix factorization (NMF) plays a paramount role as an efficient data exploratory tool for extracting basis features inherent in massive data. Some algorithms which are based on incorporating sparsity constraints in the nonconvex NMF optimization problem are applied in the past for analyzing microarray datasets. However, to the best of our knowledge, none of these algorithms use block coordinate descent method which is known for providing closed form solutions. In this paper, we apply an algorithm developed based on columnwise partitioning and rank-one matrix approximation. We test this algorithm on two well-known cancer datasets: leukemia and multiple myeloma. The numerical results indicate that the proposed algorithm performs significantly better than related state-of-the-art methods. In particular, it is shown that this method is capable of robust clustering and discovering larger cancer classes in which the cluster splits are stable.

Highlights

  • Analyzing and interpreting microarray data which represent biological processes of cancers and other related diseases are among the big challenges in data science [1,2,3]

  • The authors reported that molecular classification based on gene profiles together with hierarchical clustering (HC) helps to identify subtypes of cancer which play a vital role for undergoing clinical diagnosis

  • Perou et al [6] experimentally proved that HC is very useful for classifying molecular portraits of breast tumors into subtypes distinguished by the differences that exist in the corresponding gene expression patterns

Read more

Summary

Introduction

Analyzing and interpreting microarray data which represent biological processes of cancers and other related diseases are among the big challenges in data science [1,2,3]. The authors applied this decomposition method on elutriation yeast dataset and find out that it is capable of extracting gene expression patterns correlated with the original samples in the data It reported that the above matrix decomposition methods have some drawbacks including being unable to capture full structures and local behaviors hidden in high-dimensional data [1, 7, 8]. Frigyesi and Hoglund [7] used the divergence-based NMF algorithm to analyze some cancer and tumor data Their experimental results showed that NMF facilitates the extraction of biologically relevant structure of microarray data and plays a vital role in understanding the properties of tumor and cancer-related diseases. Instead of using multiplicative update rules, like most of the existing methods, we apply an algorithm which makes use of the block coordinate descent optimization method

Methods
Numerical Results and Discussion
GSM613799
Conclusive Remarks
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call