Abstract

Identification of driver genes, whose mutations cause the development of tumors, is crucial for the improvement of cancer research and precision medicine. To overcome the problem that the traditional frequency-based methods cannot detect lowly recurrently mutated driver genes, researchers have focused on the functional impact of gene mutations and proposed the function-based methods. However, most of the function-based methods estimate the distribution of the null model through the non-parametric method, which is sensitive to sample size. Besides, such methods could probably lead to underselection or overselection results. In this study, we proposed a method to identify driver genes by using functional impact prediction neural network (FI-net). An artificial neural network as a parametric model was constructed to estimate the functional impact scores for genes, in which multi-omics features were used as the multivariate inputs. Then the estimation of the background distribution and the identification of driver genes were conducted in each cluster obtained by the hierarchical clustering algorithm. We applied FI-net and other 22 state-of-the-art methods to 31 datasets from The Cancer Genome Atlas project. According to the comprehensive evaluation criterion, FI-net was powerful among various datasets and outperformed the other methods in terms of the overlap fraction with Cancer Gene Census and Network of Cancer Genes database, and the consensus in predictions among methods. Furthermore, the results illustrated that FI-net can identify known and potential novel driver genes.

Highlights

  • Cancers have been known to be caused by the accumulation of mutations throughout the life of an individual

  • The outline of functional impact prediction neural network (FI-net) includes (1) calculating the observed functional impact scores (FISs) for genes on the basis of Mutation Annotation Format (MAF) files and MutationAssessor (Reva et al, 2011), (2) building the artificial neural network to estimate the FISs for genes based on multi-omics features and estimating the background distribution of functional impact score in each cluster obtained by the hierarchical clustering algorithm, and (3) identifying driver genes by comparing the observed FIS to the background distribution in each cluster

  • Functional Impact Score of Mutation FI-net used the FISs from MutationAssessor (Reva et al, 2011), which assessed the functional impacts of mutations based on evolutionary conservation of the affected amino acid in protein homologs

Read more

Summary

Introduction

Cancers have been known to be caused by the accumulation of mutations throughout the life of an individual. Next-generation sequencing (Goodwin et al, 2016) technology provides a new perspective on cancer research. Genomics sequencing data across all major cancer types are available from a variety of cancer sequencing projects, such as International Cancer Genome Consortium (Hudson et al, 2010) and The Cancer Genome Altas (TCGA) (Weinstein et al, 2013). A tremendous challenge is to distinguish driver genes with mutations that are involved in tumorigenesis. Sufficient identification of driver genes promotes the understanding of tumor progression and ensures the efficiency of gene-targeted therapy for cancers (Chin et al, 2011; Shin et al, 2017).

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call