Abstract
The complicated, evolving landscape of cancer mutations poses a formidable challenge to identify cancer genes among the large lists of mutations typically generated in NGS experiments. The ability to prioritize these variants is therefore of paramount importance. To address this issue we developed OncoScore, a text-mining tool that ranks genes according to their association with cancer, based on available biomedical literature. Receiver operating characteristic curve and the area under the curve (AUC) metrics on manually curated datasets confirmed the excellent discriminating capability of OncoScore (OncoScore cut-off threshold = 21.09; AUC = 90.3%, 95% CI: 88.1–92.5%), indicating that OncoScore provides useful results in cases where an efficient prioritization of cancer-associated genes is needed.
Highlights
The complicated, evolving landscape of cancer mutations poses a formidable challenge to identify cancer genes among the large lists of mutations typically generated in NGS experiments
We analyzed the performance of OncoScore on the Cancer Genes Census (CGC; Supplementary Table 1), a collection of regularly updated and manually annotated genes accepted as causally implicated in oncogenesis[1]
To assess the ability of OncoScore to discriminate between cancer and non-cancer genes we generated the OncoScore estimation for the whole CGC dataset and for a manually curated list of genes not associated with cancer
Summary
The complicated, evolving landscape of cancer mutations poses a formidable challenge to identify cancer genes among the large lists of mutations typically generated in NGS experiments. Despite the development of a significant number of tools devoted to cancer driver prediction, limited effort has been dedicated to tools able to generate a gene-centered Oncogenic Score based on the evidence already available in the scientific literature To overcome these limitations, we propose here OncoScore, a bioinformatics text-mining tool capable of automatically scanning the biomedical literature by means of dynamically updatable web queries and measuring gene-specific cancer association in terms of gene citations. We propose here OncoScore, a bioinformatics text-mining tool capable of automatically scanning the biomedical literature by means of dynamically updatable web queries and measuring gene-specific cancer association in terms of gene citations The output of this analysis is a score representing the strength of the association of any gene symbol to cancer, based on the literature available at the time of the analysis. OncoScore is distributed as a R Bioconductor package (https://bioconductor.org/packages/release/bioc/html/OncoScore.html) in order to allow full customization of the algorithm and easy integration in existing NGS pipelines, and as a web tool for easy access by researchers with limited or no experience in bioinformatics (http://www.galseq.com/oncoscore.html)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.