Abstract
Drug sensitivity prediction constitutes one of the main challenges in personalized medicine. Critically, the sensitivity of cancer cells to treatment depends on an unknown subset of a large number of biological features. Here, we compare standard, data-driven feature selection approaches to feature selection driven by prior knowledge of drug targets, target pathways, and gene expression signatures. We asses these methodologies on Genomics of Drug Sensitivity in Cancer (GDSC) dataset, evaluating 2484 unique models. For 23 drugs, better predictive performance is achieved when the features are selected according to prior knowledge of drug targets and pathways. The best correlation of observed and predicted response using the test set is achieved for Linifanib (r = 0.75). Extending the drug-dependent features with gene expression signatures yields the most predictive models for 60 drugs, with the best performing example of Dabrafenib. For many compounds, even a very small subset of drug-related features is highly predictive of drug sensitivity. Small feature sets selected using prior knowledge are more predictive for drugs targeting specific genes and pathways, while models with wider feature sets perform better for drugs affecting general cellular mechanisms. Appropriate feature selection strategies facilitate the development of interpretable models that are indicative for therapy design.
Highlights
Drug sensitivity prediction constitutes one of the main challenges in personalized medicine
We employed each of the feature selection approaches, which can be divided into two categories: biologically driven and automatic, data-driven selection methods
We considered the union of the direct target genes and the drug’s target pathway genes
Summary
Drug sensitivity prediction constitutes one of the main challenges in personalized medicine. We compare standard, data-driven feature selection approaches to feature selection driven by prior knowledge of drug targets, target pathways, and gene expression signatures We asses these methodologies on Genomics of Drug Sensitivity in Cancer (GDSC) dataset, evaluating 2484 unique models. A multi-task learning approach based on a Bayesian model for collaborative filtering was proposed[23], which allows for identifying general interactions between features of the drugs with features of the cell lines. It gives insights in the form of ”activation of pathway Y will confer sensitivity to any drug targeting protein X”. Stability selection was proposed to mitigate this problem when regularized regression is applied[27], but it still comes without the guarantee to choose the most biologically relevant predictive features
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.