Abstract

Molecular subtyping of cancer is a critical step towards more individualized therapy and provides important biological insights into cancer heterogeneity. Although gene expression signature-based classification has been widely demonstrated to be an effective approach in the last decade, the widespread implementation has long been limited by platform differences, batch effects, and the difficulty to classify individual patient samples. Here, we describe a novel supervised cancer classification framework, deep cancer subtype classification (DeepCC), based on deep learning of functional spectra quantifying activities of biological pathways. In two case studies about colorectal and breast cancer classification, DeepCC classifiers and DeepCC single sample predictors both achieved overall higher sensitivity, specificity, and accuracy compared with other widely used classification methods such as random forests (RF), support vector machine (SVM), gradient boosting machine (GBM), and multinomial logistic regression algorithms. Simulation analysis based on random subsampling of genes demonstrated the robustness of DeepCC to missing data. Moreover, deep features learned by DeepCC captured biological characteristics associated with distinct molecular subtypes, enabling more compact within-subtype distribution and between-subtype separation of patient samples, and therefore greatly reduce the number of unclassifiable samples previously. In summary, DeepCC provides a novel cancer classification framework that is platform independent, robust to missing data, and can be used for single sample prediction facilitating clinical implementation of cancer molecular subtyping.

Highlights

  • Cancer subtyping is important for selection of patients that benefit most from specified therapies and design of novel targeted agents

  • To train a deep cancer subtype classification (DeepCC) classifier, we highly recommend employing a widely adopted molecular subtyping system, so that the deep features trained by the artificial neural network (ANN) can capture most relevant biological properties associated with each molecular subtype

  • We used the consensus molecular subtyping (CMS) system[30] for colorectal cancer (CRC) and intrinsic subtyping system for breast cancer, which are both widely adopted in respective fields

Read more

Summary

Introduction

Cancer subtyping is important for selection of patients that benefit most from specified therapies and design of novel targeted agents. Cancer classification is largely based on histopathological and clinical characteristics, which makes it difficult to implement uniformly, as individual expertize of the clinicians is often a (CRC), genetic features, such as KRAS mutation and microsatellite instability (MSI) status[3], have proven predictive power regarding anti-EGFR and 5-FU efficacy, respectively. Classifications based on these molecular characteristics still leave much of additional cancer heterogeneity unaccounted for[4]. In recent years, whole transcriptome-based cancer subtyping has been widely demonstrated as an efficient approach for dissecting cancer heterogeneity[5]. A widely-implemented strategy involves consensus clustering for determination of an optimal number of cancer subgroups, and classification with feature selection, i.e., selection of a list of signature genes[6]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.