Abstract

Gene expression data are critical for disease diagnoses and classification. However gene expression data usually are high-dimensional and high-noisy. Currently, many matrix factorization methods have been widely used for dimensionality reduction and data preprocessing in bioinformatics. Particularly, nonnegative matrix factorization (NMF) has the outstanding interpretability in analyzing gene expression data due to the nonnegative constraints. In this paper, a new nonnegative matrix factorization algorithm named sparse orthogonal nonnegative matrix factorization (SONMF) is proposed and applied to identify differentially expressed genes and cluster tumor samples, in which the L1-norm regularization and the orthogonal constraint are incorporated into the traditional NMF model to get more powerful data analysis tool. An iterative algorithm is proposed to optimize the new objective function. In order to prove the efficiency of the algorithm, SONMF is tested on four public gene expression datasets and compared with the other four NMF methods. The experimental results on the four real tumor datasets confirm the efficiency of SONMF for identifying differentially expressed genes and clustering tumor samples.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call