Abstract

Identifying and distinguishing cancer driver genes among thousands of candidate mutations remains a major challenge. Accurate identification of driver genes and driver mutations is critical for advancing cancer research and personalizing treatment based on accurate stratification of patients. Due to inter-tumor genetic heterogeneity many driver mutations within a gene occur at low frequencies, which make it challenging to distinguish them from non-driver mutations. We have developed a novel method for identifying cancer driver genes. Our approach utilizes multiple complementary types of information, specifically cellular phenotypes, cellular locations, functions, and whole body physiological phenotypes as features. We demonstrate that our method can accurately identify known cancer driver genes and distinguish between their role in different types of cancer. In addition to confirming known driver genes, we identify several novel candidate driver genes. We demonstrate the utility of our method by validating its predictions in nasopharyngeal cancer and colorectal cancer using whole exome and whole genome sequencing.

Highlights

  • The natural history of cancer is complex[1] and its genetics highly heterogeneous[2]

  • The information in ontologies is utilized by human experts to understand and interpret the implications of an association with a class in an ontology, and a comprehensive interpretation of these associations relies on comprehension and utilization of biological background knowledge

  • We use three types of information associated with genes: cellular phenotypes observed in large-scale microscopy studies and recorded using the Cellular Microscopy Phenotype Ontology (CMPO)[16]; gene functions and cellular locations recorded by Uniprot[12] and encoded using the Gene Ontology (GO)[17]; and phenotypes of knockout mouse models provided by the Mouse Genome Informatics (MGI) database[10] and encoded using the Mammalian Phenotype Ontology (MP)[18]

Read more

Summary

Introduction

The natural history of cancer is complex[1] and its genetics highly heterogeneous[2]. Many thousands of tumors have been sequenced in very large-scale studies of multiple cancer types, and several hundred genes and mutations have been identified as “drivers” – with varying support from experimental and genetic studies[6]. These methods, do not work well for low-intermediate and rare driver genes which may bear up to 20% of driver mutations[7], and the identification of drivers in specific cancers and sub-types of tumor remains difficult, often because of small numbers of tumors available. Information about gene functions is collected in databases such as Uniprot[12] as well as several model organism databases

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call