Stratification of patients diagnosed with cancer has become a major goal in personalized oncology. One important aspect is the accurate prediction of the response to various drugs. It is expected that the molecular characteristics of the cancer cells contain enough information to retrieve specific signatures, allowing for accurate predictions based solely on these multi-omic data. Ideally, these predictions should be explainable to clinicians, in order to be integrated in the patients care. We propose a machine-learning framework based on ensemble learning to integrate multi-omic data and predict sensitivity to an array of commonly used and experimental compounds, including chemotoxic compounds and targeted kinase inhibitors. We trained a set of classifiers on the different parts of our dataset to produce omic-specific signatures, then trained a random forest classifier on these signatures to predict drug responsiveness. We used the Cancer Cell Line Encyclopedia dataset, comprising multi-omic and drug sensitivity measurements for hundreds of cell lines, to build the predictive models, and validated the results using nested cross-validation. Our results show good performance for several compounds (Area under the Receiver-Operating Curve >79%) across the most frequent cancer types. Furthermore, the simplicity of our approach allows to examine which omic layers have a greater importance in the models and identify new putative markers of drug responsiveness. We propose several models based on small subsets of transcriptional markers with the potential to become useful tools in personalized oncology, paving the way for clinicians to use the molecular characteristics of the tumors to predict sensitivity to therapeutic compounds.
Read full abstract