Abstract

BackgroundThe classification of cancer subtypes is of great importance to cancer disease diagnosis and therapy. Many supervised learning approaches have been applied to cancer subtype classification in the past few years, especially of deep learning based approaches. Recently, the deep forest model has been proposed as an alternative of deep neural networks to learn hyper-representations by using cascade ensemble decision trees. It has been proved that the deep forest model has competitive or even better performance than deep neural networks in some extent. However, the standard deep forest model may face overfitting and ensemble diversity challenges when dealing with small sample size and high-dimensional biology data.ResultsIn this paper, we propose a deep learning model, so-called BCDForest, to address cancer subtype classification on small-scale biology datasets, which can be viewed as a modification of the standard deep forest model. The BCDForest distinguishes from the standard deep forest model with the following two main contributions: First, a named multi-class-grained scanning method is proposed to train multiple binary classifiers to encourage diversity of ensemble. Meanwhile, the fitting quality of each classifier is considered in representation learning. Second, we propose a boosting strategy to emphasize more important features in cascade forests, thus to propagate the benefits of discriminative features among cascade layers to improve the classification performance. Systematic comparison experiments on both microarray and RNA-Seq gene expression datasets demonstrate that our method consistently outperforms the state-of-the-art methods in application of cancer subtype classification.ConclusionsThe multi-class-grained scanning and boosting strategy in our model provide an effective solution to ease the overfitting challenge and improve the robustness of deep forest model working on small-scale data. Our model provides a useful approach to the classification of cancer subtypes by using deep learning on high-dimensional and small-scale biology data.

Highlights

  • The classification of cancer subtypes is of great importance to cancer disease diagnosis and therapy

  • Many modified models have been proposed to ease these challenges in the past few years [5, 22], the alternative options of available methods towards small-scale biology data are still limited, and more accurate and robust methods need to be further developed for the mission of cancer subtype classification

  • BCDForest consistently outperformed the standard gcForest on most of cancer datasets. This illustrates that our boosting strategies are effectively to improve the classifying ability of the standard deep forest model on smallscale biology cancer datasets, and it provides a robust model to the classification of cancer subtypes

Read more

Summary

Introduction

The classification of cancer subtypes is of great importance to cancer disease diagnosis and therapy. The standard deep forest model may face overfitting and ensemble diversity challenges when dealing with small sample size and high-dimensional biology data. In the past few years, various types of large-scale genomic data have been used for cancer prognosis integrating gene function studies [6,7,8] and subtype outcome prediction [1, 4, 9,10,11], and numerous cancer subtype classification methods have been proposed [5, 12,13,14]. Many modified models have been proposed to ease these challenges in the past few years [5, 22], the alternative options of available methods towards small-scale biology data are still limited, and more accurate and robust methods need to be further developed for the mission of cancer subtype classification

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call