CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise

Mihaela Pertea,Anil K Madugundu,Geo Pertea,Yu-Chi Chang,Akhilesh Pandey,Alaina Shumate,Steven L Salzberg,Florian P Breitwieser,Ales Varabyou

doi:10.1186/s13059-018-1590-2

Abstract

We assembled the sequences from deep RNA sequencing experiments by the Genotype-Tissue Expression (GTEx) project, to create a new catalog of human genes and transcripts, called CHESS. The new database contains 42,611 genes, of which 20,352 are potentially protein-coding and 22,259 are noncoding, and a total of 323,258 transcripts. These include 224 novel protein-coding genes and 116,156 novel transcripts. We detected over 30 million additional transcripts at more than 650,000 genomic loci, nearly all of which are likely nonfunctional, revealing a heretofore unappreciated amount of transcriptional noise in human cells. The CHESS database is available at http://ccb.jhu.edu/chess.

Highlights

Scientists have been attempting to estimate the number of human genes for more than 50 years, dating back to 1964 [1]
In the decade preceding the initial publication of the human genome, multiple estimates were made based on sequencing of short messenger RNA fragments, and most of these estimates fell in the range of 50,000–100,000 genes [2,3,4,5]
To validate the coding potential of novel loci identified in this study, we searched the unmatched spectra from 30 human tissue/cell types against the novel predicted open reading frame (ORF) described in this study

Summary

Background

Scientists have been attempting to estimate the number of human genes for more than 50 years, dating back to 1964 [1]. Novel transcripts may in some cases represent novel combinations of exons—e.g., exon-skipping events—but in many cases, they include novel splice sites that create new exons and introns To answer this question, we compared all of the protein coding and lncRNA transcripts in CHESS (version 2.1), RefSeq (release 108), and GENCODE (v28) to determine the number of (a) introns and (b) transcripts that were shared among all combinations of the three databases. To validate the coding potential of novel loci identified in this study, we searched the unmatched spectra from 30 human tissue/cell types (see the “Methods” section) against the novel predicted ORFs described in this study Peptides identified in this search that were either identical to annotated proteins or mapped with a single mismatch were discarded. We note that the abundance of these novel transcripts was very low and the ORFs are relatively short, both of which may explain the small number of identified peptides

Methods

Findings

Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Genome Biology	Publication Date: Nov 28, 2018
Citations: 289	License type: open-access

R Discovery Prime

R Discovery Prime

CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genome Biology

Lead the way for us

Similar Papers

Diversity of Translation Start Sites May Define Increased Complexity of the Human Short ORFeome
Masaaki Oyama ... Sumio Sugano
Molecular & Cellular Proteomics | VOL. 6
Masaaki Oyama, et. al.Masaaki Oyama ... Sumio Sugano
01 Jun 2007
Molecular & Cellular Proteomics | VOL. 6

Future Virology: A Mitochondriac's Perspective
Craig E Cameron
Future Virology | VOL. 8
Craig E CameronCraig E Cameron
16 Sep 2013
Future Virology | VOL. 8

Histologic and Quality Assessment of Genotype-Tissue Expression (GTEx) Research Samples: A Large Postmortem Tissue Collection.
Leslie Sobin ... Helen M Moore
Archives of pathology & laboratory medicine | VOL. -
Leslie Sobin, et. al.Leslie Sobin ... Helen M Moore
27 May 2024
Archives of pathology & laboratory medicine | VOL. -

Notable Histologic Findings in a "Normal" Cohort: The National Institutes of Health Genotype-Tissue Expression (GTEx) Project
Philip A Branton ... Helen M Moore
Archives of pathology & laboratory medicine | VOL. -
Philip A Branton, et. al.Philip A Branton ... Helen M Moore
27 Apr 2024
Archives of pathology & laboratory medicine | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Genome Biology