A dropout-regularized classifier development approach optimized for precision medicine test discovery from omics data

Joanna Roder,Lelia Net,Maxim Tsypin,Benjamin Linstid,Carlos Oliveira,Heinrich Roder

doi:10.1186/s12859-019-2922-2

Joanna Roder, Lelia Net + Show 4 more

Open Access

https://doi.org/10.1186/s12859-019-2922-2

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Jun 13, 2019
Citations: 14	License type: open-access

Affiliation: Helix Biomedix (United States)

Abstract

BackgroundModern genomic and proteomic profiling methods produce large amounts of data from tissue and blood-based samples that are of potential utility for improving patient care. However, the design of precision medicine tests for unmet clinical needs from this information in the small cohorts available for test discovery remains a challenging task. Obtaining reliable performance assessments at the earliest stages of test development can also be problematic. We describe a novel approach to classifier development designed to create clinically useful tests together with reliable estimates of their performance. The method incorporates elements of traditional and modern machine learning to facilitate the use of cohorts where the number of samples is less than the number of measured patient attributes. It is based on a hierarchy of classification and information abstraction and combines boosting, bagging, and strong dropout regularization.ResultsWe apply this dropout-regularized combination approach to two clinical problems in oncology using mRNA expression and associated clinical data and compare performance with other methods of classifier generation, including Random Forest. Performance of the new method is similar to or better than the Random Forest in the two classification tasks used for comparison. The dropout-regularized combination method also generates an effective classifier in a classification task with a known confounding variable. Most importantly, it provides a reliable estimate of test performance from a relatively small development set of samples.ConclusionsThe flexible dropout-regularized combination approach is able to produce tests tailored to particular clinical questions and mitigate known confounding effects. It allows the design of molecular diagnostic tests addressing particular clinical questions together with reliable assessment of whether test performance is likely to be fit-for-purpose in independent validation at the earliest stages of development.

Highlights

Modern genomic and proteomic profiling methods produce large amounts of data from tissue and blood-based samples that are of potential utility for improving patient care
Ten-year survival for prostate cancer: testing the ability of the classifier development method to work well with small datasets The classification task was to differentiate patients with prostate cancer still alive after 10 years of follow up from those dying within the 10-year period. Messenger Ribonucleic Acid (mRNA) expression data for 343 genes were available for a development cohort (GSE16560) and a validation cohort (GSE10645)
Parameters defining the dropout-regularized combination (DRC) approach were held fixed throughout this investigation with no tuning to improve performance

Summary

Introduction

Modern genomic and proteomic profiling methods produce large amounts of data from tissue and blood-based samples that are of potential utility for improving patient care. The method incorporates elements of traditional and modern machine learning to facilitate the use of cohorts where the number of samples is less than the number of measured patient attributes It is based on a hierarchy of classification and information abstraction and combines boosting, bagging, and strong dropout regularization. Prospective studies designed to collect specimens from large cohorts of subjects in which the test is intended to be used are expensive and hard to justify when probability of successful test generation may be low. It is often necessary, at least in a feasibility or pilot stage, to make use of retrospectively collected sample sets.

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A dropout-regularized classifier development approach optimized for precision medicine test discovery from omics data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Artificial intelligence in the risk prediction models of cardiovascular disease and development of an independent validation screening tool: a systematic review
Yue Cai ... Jesse Li-Ling
BMC medicine | VOL. 22
Yue Cai, et. al.Yue Cai ... Jesse Li-Ling
05 Feb 2024
BMC medicine | VOL. 22

Block Forests: random forests for blocks of clinical and omics covariate data
Roman Hornung ... Marvin N Wright
BMC Bioinformatics | VOL. 20
Roman Hornung, et. al.Roman Hornung ... Marvin N Wright
27 Jun 2019
BMC Bioinformatics | VOL. 20

Transfer Learning with Prior Data-Driven Models from Multiple Unconventional Fields
Jodel Cornelio ... Young Cho
SPE Journal | VOL. 28
Jodel Cornelio, et. al.Jodel Cornelio ... Young Cho
20 Apr 2023
SPE Journal | VOL. 28

Modeling the relationship between reliability assessment and risk predictors using Bayesian networks and a multiple logistic regression model
Anan Halabi ... Laura Sacerdote
Quality Engineering | VOL. 30
Anan Halabi, et. al.Anan Halabi ... Laura Sacerdote
13 Oct 2017
Quality Engineering | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A dropout-regularized classifier development approach optimized for precision medicine test discovery from omics data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics