Abstract One of the greatest hurdles for cancer biologists is to identify the cancer-causal genes from thousands of candidate genes suggested by large-scale genomics or deep sequencing studies. In our previous study, we discovered that cancer genes possess a complicated yet distinct “gene concept signature.” Concept Signatures include cancer-related signaling pathways, molecular interactions, transcriptional motifs, protein domains, and gene ontologies. We developed a Concept Signature (or ConSig) analysis that prioritizes the biological importance of candidate genes underlying cancer by computing their strength of association with those cancer-related signature concepts. The ConSig analysis has facilitated the discovery of a recurrent ESR1-CCDC170 gene fusion in more aggressive Luminal B breast cancers (Nat. Commun. 2014) as well as TLK2, MAP3K3, and MYST3 amplifications in aggressive luminal breast cancer (Nat. Commun. 2016, J. Pathol. 2014, Oncogene In press). Nevertheless, current candidate gene prioritization methods, including ConSig, are subject to bias from redundancy in the compiled knowledgebase (also known as gene concept database). This leads to variation of the gene ranking and jeopardizes the reliability of the priority methods. In light of these problems, we developed an innovative, universal algorithm called uniConSig. By penalizing overlapping concepts with a stable parameter, “effective concept number”, we reduced the fluctuation in uniConSig scores, and stabilized the ranking of the genes even with the randomly duplicated gene concept databases. We tested the uniConSig algorithm by identifying known cancer genes based on a cancer gene list, and found that the uniConSig algorithm demonstrated significantly enhanced prioritization of known cancer genes compared to the ConSig algorithm, and the results are stable even in the presence of randomly duplicated gene concept databases. In addition, we used calculations based on the dominant/recessive cancer gene lists, and were able to provide a quantitative measure of the potential dominant/recessive functions of human genes underlying cancer. As an example application of this algorithm, we show that the uniConSig scores can directly reveal the primary oncogene targets from genomic amplicons in breast cancer.To our knowledge, UniConSig is the first tool for genome-wide quantification of gene functions and disease associations. UniConSig has broad applications on gene prioritization for genomic-based studies to discover new disease causal genes or new gene functions. Citation Format: Xu Chi, Meenakshi Anurag, Sartor A. Maureen, Xiaosong Wang. UniConSig: A new algorithm for genome wide quantification of gene functions and disease associations [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 4538. doi:10.1158/1538-7445.AM2017-4538
Read full abstract