Abstract

Background: High-dimensional genomic data are indispensable in drug research and development. Genomic data-based biomarker discovery provides comprehensive insights to drug mechanism of action and efficacy prediction. However, it faces the difficulty of selecting suitable signature genes from several thousand or more genes with highly fluctuating and often tightly correlated expression patterns. This study aims to compare the performance of three recent signature gene selection algorithms in several large in vitro cell assays. Methods: Drug response datasets of irinotecan, cetuximab, pelitinib, gefitinib, erlotinib and KRAS (G12C) inhibitor-12 were obtained from the Genomics of Drug Sensitivity in Cancer (GDSC). Cell line gene expression data were gathered from our internal dataset of 20,432 genes, with 1,068 cell lines, across 23 tissues. The three algorithms evaluated were Stable Iterative Variable Selection (SIVS), Precision Lasso (PL) and Whitening Lasso (WL). Efficacy variance between different cancer types (ANOVA p-value) and prediction accuracy of drug efficacy were calculated to evaluate the results. Results: The accuracy of PL predictions was more positively correlated with the number of genes than SIVS and WL. The number of signature genes was limited from 5 to 50 and the most precise results were obtained for all drugs with 30 to 50 genes composite biomarkers. The PL, WL and SIVS methods separately selected 30, 17 and 6 genes as composite biomarkers for irinotecan with an accuracy of 85.7%, 78.2% and 87.3% respectively, in the training set. In validation set, the prediction accuracy of the SIVS panel was the highest at 83.1%. For erlotinib, SIVS, WL and PL picked out 22, 23 and 32 genes as the best composite biomarker with an accuracy of 73%, 83% and 76.8%, respectively. Overall, composite biomarkers selected by SIVS was the superior choice in four of six drugs. In most cases, these three algorithms took less time than 100 iterations of Boruta under the same conditions. Conclusions: Compared with PL and WL, the SIVS method appears to obtain a higher prediction accuracy with a relatively smaller number of genes in this research. PL tended to select many more signature genes than SIVS and WL to yield comparable performance. Overlapping signature genes between the three algorithms was generally low, including for genes belonging to same molecular pathways. The cost and practicality of in vivo experiments would have made it difficult to select a large number of cancer types, drug response, and gene expression data. Therefore, selecting the best predictive composite biomarker in vitro screening for in vivo validation can significantly reduce costs and accelerate the early drug development process. No conflict of interest.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call