The application of artificial intelligence to drug sensitivity prediction

Lifan Chen,Feisheng Zhong,Yingjia Chen,Xiaozhe Wan,Kaixian Chen,Chen Cui,Xutong Li,Xiaolong Wu,Mingyue Zheng,Hualiang Jiang

doi:10.1360/tb-2020-0557

Abstract

The development of computational methods for the prediction of effective therapeutic strategies based on the genomic information of patients is the main challenge of precision medicine. Since the 21st century, next-generation sequencing (NGS) has opened up new possibilities for personalized medicine. Extensive characterization at the molecular level for hundreds of cancer cell lines has been brought to the public eye by many organizations and agencies around the world. For example, the National Cancer Institute 60 Human Cancer Cell Line Screen (NCI-60), Cancer Cell Line Encyclopedia (CCLE) and Genomics of Drug Sensitivity in Cancer (GDSC) have provided large-scale omics data such as genomic, transcriptomic and epigenomic data characterizing cancer cell lines, and The Cancer Genome Atlas (TCGA) has molecularly characterized over 20000 primary cancers of patients. Combined with the drug response data of cancer cell lines, multiomics data could be used to analyse the mechanisms of action of anticancer drugs, which could be incorporated into precision medicine strategies. Over several decades, artificial intelligence (AI) technologies based on big data have revolutionized bioinformatics. AI has built a bridge between genomics and drug sensitivity by promoting the development of predictive models for the drug response of cancer cell lines. The 2012 NCI-DREAM drug prediction challenge has been particularly influential, as the innovative applications of machine learning that emerged from it have laid the groundwork for future studies. However, classic machine learning models are still challenging in terms of predictability because they limit the systematic integration of high-dimensional multiomics data. Therefore, network-based approaches, including link prediction and network representation, have become mainstream methods for drug response prediction. On the one hand, network-based approaches have not faced the “small n, large p” problem since the multiomics features are either represented in a gene/protein network or embedded in similarity networks between cell lines. On the other hand, the introduction of gene regulatory networks (GRNs) and protein-protein interactions (PPIs) into the predictive model can provide a functional background for the integration of genomic data and thereby improve the predictive performance of drug response. In addition to network-based approaches, multimodal deep learning models can systematically integrate multiomic data by considering them as different modalities. Generally, there are three feature fusion methods in deep neural networks: Input-level feature fusion (early fusion), intermediate feature fusion and decision-level fusion (late fusion). Intermediate feature fusion is predominant in drug response prediction studies, by which features are learned separately for each type of omics data and then integrated into one unified representation to be used as the input for a classifier or a regressor. Moreover, the features of drug structures can be used as a model to improve the performance. In brief, we summarize the characteristics of publicly accessible genomic databases and discuss the trends of artificial intelligence applications in drug sensitivity prediction for cancer cell lines, including machine learning, networks and multimodal deep neural networks.

Full Text