Abstract

MotivationPrevious studies have demonstrated that machine learning based molecular cancer classification using gene expression profiling (GEP) data is promising for the clinic diagnosis and treatment of cancer. Novel classification methods with high efficiency and prediction accuracy are still needed to deal with high dimensionality and small sample size of typical GEP data. Recently the sparse representation (SR) method has been successfully applied to the cancer classification. Nevertheless, its efficiency needs to be improved when analyzing large-scale GEP data.ResultsIn this paper we present the meta-sample-based regularized robust coding classification (MRRCC), a novel effective cancer classification technique that combines the idea of meta-sample-based cluster method with regularized robust coding (RRC) method. It assumes that the coding residual and the coding coefficient are respectively independent and identically distributed. Similar to meta-sample-based SR classification (MSRC), MRRCC extracts a set of meta-samples from the training samples, and then encodes a testing sample as the sparse linear combination of these meta-samples. The representation fidelity is measured by the l2-norm or l1-norm of the coding residual.ConclusionsExtensive experiments on publicly available GEP datasets demonstrate that the proposed method is more efficient while its prediction accuracy is equivalent to existing MSRC-based methods and better than other state-of-the-art dimension reduction based methods.

Highlights

  • With the advance of DNA microarray and next-generation sequencing (NGS) technology [1], a large amount of gene expression profiles (GEP) data has been rapidly accumulated, which requires novel analysis method to deeply mine these big data to interpret such data to gain insight into the mechanism of tumor development

  • In this paper we present the meta-sample-based regularized robust coding classification (MRRCC), a novel effective cancer classification technique that combines the idea of meta-sample-based cluster method with regularized robust coding (RRC) method

  • Similar to meta-sample-based sparse representation (SR) classification (MSRC), MRRCC extracts a set of meta-samples from the training samples, and encodes a testing sample as the sparse linear combination of these meta-samples

Read more

Summary

Introduction

With the advance of DNA microarray and next-generation sequencing (NGS) technology [1], a large amount of gene expression profiles (GEP) data has been rapidly accumulated, which requires novel analysis method to deeply mine these big data to interpret such data to gain insight into the mechanism of tumor development. Other feature extraction methods such as principal component analysis (PCA)[14], linear discriminant analysis (LDA) [15], locally linear discriminant embedding (LLDE) [16], and partial least squares (PLS) [17] are extensively applied to the dimensionality reduction of GEP. These methods can generally achieve satisfactory classification performance with the minimum dimension reduction. Both feature selection and feature extraction methods have their own advantages and disadvantages. It is difficult to precisely interpret the biomedical meanings of derived features

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.