Abstract

For an increasing number of preclinical samples, both detailed molecular profiles and their responses to various drugs are becoming available. Efforts to understand, and predict, drug responses in a data-driven manner have led to a proliferation of machine learning (ML) methods, with the longer term ambition of predicting clinical drug responses. Here, we provide a uniquely wide and deep systematic review of the rapidly evolving literature on monotherapy drug response prediction, with a systematic characterization and classification that comprises more than 70 ML methods in 13 subclasses, their input and output data types, modes of evaluation, and code and software availability. ML experts are provided with a fundamental understanding of the biological problem, and how ML methods are configured for it. Biologists and biomedical researchers are introduced to the basic principles of applicable ML methods, and their application to the problem of drug response prediction. We also provide systematic overviews of commonly used data sources used for training and evaluation methods.

Highlights

  • The continued development of new high-throughput molecular profiling (‘omics’) technologies and their concomitant increased accessibility for biomedical applications make it increasingly attractive to study complex biological phenomena using datadriven and machine learning (ML) approaches [1]

  • In cases when k-fold cross validation would lead to training sets that are too small, typically, leaveone-out cross validation (LOOCV) is used, in which test sets of only a single sample are used in each iteration

  • Machine learning is a subfield of artificial intelligence whose classification and prediction algorithms learn patterns underlying the data from examples

Read more

Summary

Introduction

The continued development of new high-throughput molecular profiling (‘omics’) technologies and their concomitant increased accessibility for biomedical applications make it increasingly attractive to study complex biological phenomena using datadriven and machine learning (ML) approaches [1]. The NCI-60 panel of measurements includes gene expression (GE), copy number variation (CNV), single nucleotide polymorphism (SNP), DNA methylation (DM) and proteomics profiles of each cell line Analysis of this data resulted in the discovery of a number of gene mutations that are highly predictive of clinical responses [27] and provided new insights into drug activity modulators and drug-target interactions. Logistic regression is a common classification approach that can be derived, in a Bayesian framework, from a simple probabilistic model that expresses drug response as a function of molecular profiles [30, 44, 55, 56] Classifiers derived from these models have linear decision boundaries, i.e. lines or hyperplanes that separate different classes in the input space [42, 57]. Artificial neural networks (ANNs) are powerful ML models that can approximate arbitrary input–output relationships, given a

Evaluation
Evaluation of DRP models
Discussion and outlook
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call