Abstract

Cataloging mutated driver genes that confer a selective growth advantage for tumor cells from sporadic passenger mutations is a critical problem in cancer genomic research. Previous studies have reported that some driver genes are not highly frequently mutated and cannot be tested as statistically significant, which complicates the identification of driver genes. To address this issue, some existing approaches incorporate prior knowledge from an interactome to detect driver genes which may be dysregulated by interaction network context. However, altered operations of many pathways in cancer progression have been frequently observed, and prior knowledge from pathways is not exploited in the driver gene identification task. In this paper, we introduce a driver gene prioritization method called driver gene identification through pathway and interactome information (DGPathinter), which is based on knowledge-based matrix factorization model with prior knowledge from both interactome and pathways incorporated. When DGPathinter is applied on somatic mutation datasets of three types of cancers and evaluated by known driver genes, the prioritizing performances of DGPathinter are better than the existing interactome driven methods. The top ranked genes detected by DGPathinter are also significantly enriched for known driver genes. Moreover, most of the top ranked scored pathways given by DGPathinter are also cancer progression-associated pathways. These results suggest that DGPathinter is a useful tool to identify potential driver genes.

Highlights

  • In the last decade, studies based on advanced DNA sequencing technologies have highlighted the fact that the development and progression of cancer hinges on somatic abnormalities of DNA (Hudson et al, 2010; Vogelstein et al, 2013; Raphael et al, 2014)

  • Somatic mutations of genes in tumor samples can be efficiently detected by generation sequencing technology (Schuster, 2007; Xiong et al, 2011; Zhao et al, 2013), and enormous accumulated datasets of cancer genomic alterations have been provided by studies such as the cancer genome atlas (TCGA) (Weinstein et al, 2013) and the International Cancer Genome Consortium (ICGC) (Hudson et al, 2010)

  • Somatic mutation datasets For the somatic mutation data of cancers, we focus on three types of cancers from TCGA datasets, which include 507 tumors samples from breast invasive carcinoma (BRCA) (Cancer Genome Atlas Network, 2012), 83 tumor samples from glioblastoma multiforme (GBM) (Cancer Genome Atlas Research Network, 2008) and 401 tumor samples from thyroid carcinoma (THCA) (Cancer Genome Atlas Research Network, 2014)

Read more

Summary

Introduction

Studies based on advanced DNA sequencing technologies have highlighted the fact that the development and progression of cancer hinges on somatic abnormalities of DNA (Hudson et al, 2010; Vogelstein et al, 2013; Raphael et al, 2014). Somatic mutations of genes in tumor samples can be efficiently detected by generation sequencing technology (Schuster, 2007; Xiong et al, 2011; Zhao et al, 2013), and enormous accumulated datasets of cancer genomic alterations have been provided by studies such as the cancer genome atlas (TCGA) (Weinstein et al, 2013) and the International Cancer Genome Consortium (ICGC) (Hudson et al, 2010) These large-scale datasets of cancer genomics offer us an unprecedented opportunity to discover driver genes from the somatic mutation profiles of tumor samples (Kandoth et al, 2013; Lawrence et al, 2013, 2014; Tamborero et al, 2013)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call