Abstract

BackgroundIdentification of novel cancer-causing genes is one of the main goals in cancer research. The rapid accumulation of genome-wide protein-protein interaction (PPI) data in humans has provided a new basis for studying the topological features of cancer genes in cellular networks. It is important to integrate multiple genomic data sources, including PPI networks, protein domains and Gene Ontology (GO) annotations, to facilitate the identification of cancer genes.MethodsTopological features of the PPI network, as well as protein domain compositions, enrichment of gene ontology categories, sequence and evolutionary conservation features were extracted and compared between cancer genes and other genes. The predictive power of various classifiers for identification of cancer genes was evaluated by cross validation. Experimental validation of a subset of the prediction results was conducted using siRNA knockdown and viability assays in human colon cancer cell line DLD-1.ResultsCross validation demonstrated advantageous performance of classifiers based on support vector machines (SVMs) with the inclusion of the topological features from the PPI network, protein domain compositions and GO annotations. We then applied the trained SVM classifier to human genes to prioritize putative cancer genes. siRNA knock-down of several SVM predicted cancer genes displayed greatly reduced cell viability in human colon cancer cell line DLD-1.ConclusionTopological features of PPI networks, protein domain compositions and GO annotations are good predictors of cancer genes. The SVM classifier integrates multiple features and as such is useful for prioritizing candidate cancer genes for experimental validations.

Highlights

  • Identification of novel cancer-causing genes is one of the main goals in cancer research

  • To reduce the false positives in classifying genes not involved in cancer, we extended the comparison of various features in four non-overlapping gene groups, i.e. "cancer genes" from the Cancer Gene Census [1], "Catalogue Of Somatic Mutations In Cancer (COSMIC) genes" profiled for somatic mutations in cancer and deposited into the Catalogue Of Somatic Mutations In

  • Validated interactions were derived from the Biomolecular Interaction Network Database (BIND) [21], the Human Protein Reference Database (HPRD) [22], Reactome [23], and the Kyoto Encyclopedia of Genes and Genomes (KEGG) [24]

Read more

Summary

Introduction

Identification of novel cancer-causing genes is one of the main goals in cancer research. The rapid accumulation of genome-wide protein-protein interaction (PPI) data in humans has provided a new basis for studying the topological features of cancer genes in cellular networks. It was proposed that direct and indirect interactions often occur between protein pairs whose mutations are attributable to similar disease phenotypes This concept was utilized to predict phenotypic effects of gene mutations using protein complexes [5] and identify previously unknown complexes likely to be associated with disease [6,7]. The rapid accumulation of genome-wide human PPI data has provided a new basis for studying the topological features of cancer genes. An interactome-transcriptome analysis reported increased interaction connectivity of differentially expressed genes in lung squamous cancer tissues [9] These studies indicated a central role of cancer proteins within the interactome. A systematic analysis of all these features side-by-side is needed to evaluate their merits, both individually and in combination, in cancer gene prediction

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call