Abstract
Feature selection and sample clustering play an important role in bioinformatics. Traditional feature selection methods separate sparse regression and embedding learning. Later, to effectively identify the significant features of the genomic data, Joint Embedding Learning and Sparse Regression (JELSR) is proposed. However, since there are many redundancy and noise values in genomic data, the sparseness of this method is far from enough. In this paper, we propose a strengthened version of JELSR by adding the L1-norm constraint on the regularization term based on a previous model, and call it LJELSR, to further improve the sparseness of the method. Then, we provide a new iterative algorithm to obtain the convergence solution. The experimental results show that our method achieves a state-of-the-art level both in identifying differentially expressed genes and sample clustering on different genomic data compared to previous methods. Additionally, the selected differentially expressed genes may be of great value in medical research.
Highlights
With the emergence of deep sequencing technologies, considerable genomic data have become available
To validate the effectiveness of our method, the LJELSR, Joint Embedding Learning and Sparse Regression (JELSR), ReDac, and SMART methods are run on three datasets, including the ALL_AML, the colon cancer, and the esophageal carcinoma dataset (ESCA) datasets
The ALL_AML dataset includes acute lymphoblastic leukemia (ALL) and acute myelogenous leukemia (AML) [13], and ALL has been divided into T cell subtypes and B cell subtypes
Summary
With the emergence of deep sequencing technologies, considerable genomic data have become available. Since genomic data are usually high-dimension small-sample data, that is, the dimension of the gene is large, the dimension of the sample is small, and it is easy to cause interference when performing feature selection and difficult to understand the sample directly [1]. How to identify these key genes from the massive high-dimensional genomic data is a hotspot and nodus in research. Studies have testified that these key genes are efficaciously extracted by embedding learning [3]. Cluster analysis is based on the similarity of each data point to classify the samples or genes, which is helpful for accurate determination of the cancer subtype. Some studies have demonstrated that embedding learning and sparse regression is good for cluster analysis and feature selection [4,5]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.