Abstract

A search of PubMed lists >582,000 citations with the keywords “cancer” and “gene”. The large volume of cancer genomic publications necessitates the development of text-mining tools to help cancer researchers navigate and summarize articles efficiently. We developed a Cancer Publication Portal (CPP) to help researchers efficiently search and summarize cancer genomic publications, based on one or more genes of interest. CPP integrates data from several sources, including PubTator, the Medical Subject Headings (MeSH) database; the HUGO Gene Nomenclature Committee human gene name database; PubMed, a database of biomedical literature citations; and the National Cancer Institute (NCI) Thesaurus. Following each query, results are summarized and include the publication frequency for each cancer type, as well as publication frequencies for cancer terms, pharmacological agents, genomic mutations, and additional genes stratified by cancer type. Cancer terms were identified by comparing titles and abstracts from cancer-related (N=851,868) and non-cancer related articles (N=2,607,020). CPP allows a user to quickly obtain publication statistics, such as the frequency of articles mentioning EGFR across cancer types, and to explore associations, such as the association between pharmacological agent and cancer type. Result summaries are interactive, so additional filters can be easily added as the literature is explored. After a search is completed, a PubTator collection can be quickly created, in order to view article titles and abstracts in PubTator. CPP currently includes information for ~1.1 million cancer-related publications associated with >23,000 human genes. Database URL: https://gdancik.github.io/bioinformatics/CPP/.

Highlights

  • Cancer is a genetic disease[1], with relevant genes often identified through functional screening[2,3,4], gene expression profiling[5,6,7], or genomic sequencing experiments[8,9,10]

  • CPP14,15 is designed to help users efficiently explore and summarize the cancer genomic literature, and should be useful for cancer researchers who are looking for relevant articles for a gene of interest, for meta-researchers who study the publication landscape, and for students learning about the relationship between one or more genes and cancer types

  • Quickly summarizes articles across cancer types, Cancer Publication Portal (CPP) can be used to assess whether a gene might be novel for a particular cancer type, based on the frequency of gene mentions in titles/abstracts of cancer-related publications

Read more

Summary

Introduction

Cancer is a genetic disease[1], with relevant genes often identified through functional screening[2,3,4], gene expression profiling[5,6,7], or genomic sequencing experiments[8,9,10]. While PubTator allows users to search PubMed based on these biological concepts, summaries of the results are not provided. Other tools such as Anne O’Tate[12] and PubReminer[13] summarize PubMed searches, but are not cancer-specific and have limitations regarding the number of results that can be returned. PubReminer[13] allows PubMed queries and summarizes articles based on common words, MeSH terms, and other fields. These summaries are useful but are not cancer specific, and cancer type mentions can be difficult to find and may not appear in the search results

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call