Abstract
Tumor has become a hot topic in the field of image processing and pattern recognition. Gene expression data is an important way to study tumor. Since gene expression data is characterized by high-dimensional small samples, it is one of the key steps to extract discriminative gene features to distinguish different tumor types. Nonnegative matrix factorization (NMF) is an unsupervised feature representation method that does not depend on the label information of data. NMF can achieve nonlinear dimension reduction and is widely used in tumor recognition. Considering that some data may carry label information, models’ feature representation capability will be improved if the label information can be used effectively. Therefore, this paper proposes a semisupervised NMF model based on label consistency (LC-NMF), which uses both labeled and unlabeled data to obtain feature representations. Furthermore, to alleviate the sensitivity of the NMF model to initial values and excavate the deep features of data, a label consistency-based deep semisupervised NMF model (LC-DNMF) is constructed, which combines the LC-NMF model with the layer-by-layer pretraining and multilayer representation strategy in deep learning. The performance of the proposed models (i.e., LC-NMF and LC-DNMF) is verified by applying them to the tumor recognition tasks. The experimental results on seven datasets show that the two models achieve good results and can obtain competitive recognition accuracies compared with the state-of-the-art methods. Furthermore, the performance of the LC-DNMF model outperforms that of the LC-NMF model, which verifies the effectiveness of introducing the layer-by-layer pretraining and multilayer representation strategy.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have