Abstract

Abstract Background: High Throughput Sequencing (HTS) has generated tones of sequencing data that is almost unbearable to interpret. Adequate data mining strategy is in need to take huge volume of sequencing data and be able to derive clinically meaningful information. The positional distribution of somatic mutation on corresponding gene has strong implication of oncogenic involvement. We purpose a framework of cancer gene prioritization. Our aim is to derive candidate cancer genes by leveraging large volume of cancer samples. Methods: We retrospectively analyzed 13160 clinically derived tumor samples consisting of five cancer types. These samples were previously subjected to captured sequencing covering either entire exome or major cancer related genes. To predict genes with oncogenic property, a modified version of 20/20 rule adapted from Bert's study was applied. After pan-cancer analysis, samples of each cancer type were analyzed separately to obtain cancer type specific candidate genes. Candidate genes uniquely predicted by our study were annotated in terms of molecular function and thoroughly assessed through literature curation. Results: Using the modified 20/20 rule, pan-cancer analysis reported a total number of 186 candidate genes. 29 of them were uniquely reported by our analysis (absence in Bert's list, CGC list and our internal curation list) including 1 oncogene and 28 tumor suppressors. Their oncogenic involvements were further confirmed using random forest classifier 20/20+. 29 novel cancer candidate genes were significantly enriched in well-established cancer pathways described in Sanchez-Vega, et's study (pval<0.01). For instance, Ras GTPase-activating protein 1 (RASA1) is an inhibitory regulator of the Ras-cyclic AMP pathway. Our data revealed that RASA1 is enriched with inactivating mutations across its coding region: a typical mutational pattern of tumor suppressor. However, literature mining showed limited understanding regarding the its oncogenic rule. In cancer type specific analysis, 14 additional candidate cancer genes were uniquely reported by our analysis including known driver genes such as HDAC4 in gastric cancer and KLF5 in intestinal cancer. However, some reported candidate genes were less studied, such as oncogene WNT10A in liver cancer, probably due to low mutation frequency in western population (<0.5% according to TCGA). However, it mutated in 5.29% (59/1115) of our liver cancer cohort and highly recurrent mutations were identified at R171C and G213S. Citation Format: Cheng Yan, Hongwei Wu, Junhui Yang, Weihua Guo, Yufei Yang. Retrospective analysis of 13160 clinically sequenced tumor samples reveals potential cancer drivers [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 2142.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call