Abstract

Upstream open reading frames (uORFs) latent in mRNA transcripts are thought to modify translation of coding sequences by altering ribosome activity. Not all uORFs are thought to be active in such a process. To estimate the impact of uORFs on the regulation of translation in humans, we first circumscribed the universe of all possible uORFs based on coding gene sequence motifs and identified 1.3 million unique uORFs. To determine which of these are likely to be biologically relevant, we built a simple Bayesian classifier using 89 attributes of uORFs labeled as active in ribosome profiling experiments. This allowed us to extrapolate to a comprehensive catalog of likely functional uORFs. We validated our predictions using in vivo protein levels and ribosome occupancy from 46 individuals. This is a substantially larger catalog of functional uORFs than has previously been reported. Our ranked list of likely active uORFs allows researchers to test their hypotheses regarding the role of uORFs in health and disease. We demonstrate several examples of biological interest through the application of our catalog to somatic mutations in cancer and disease-associated germline variants in humans.

Highlights

  • Upstream open reading frames consist of a start codon in the 5 untranslated region of a gene (UTR) and an associated stop codon appearing before the stop codon of the main coding DNA sequence (CDS)

  • We extracted the subset of Upstream open reading frames (uORFs) identified as translated in the studies of Lee et al, Fritsch et al and Gao et al We further stratified this set of translated uORFs according to shared representation of uORFs among the three studies. uORFs identified in the intersection between two or more of these studies were used as the reference standard for functional uORFs

  • This finding is consistent with prior work showing that longer 5 UTRs, in general, have less functional impact on CDS translation than shorter 5 UTRs after controlling for number of uORFs [62]

Read more

Summary

Introduction

Upstream open reading frames (uORFs) consist of a start codon in the 5 untranslated region of a gene (UTR) and an associated stop codon appearing before the stop codon of the main coding DNA sequence (CDS). An uORF may begin and end before the main gene coding sequence. If the upstream reading frame is out of frame with the CDS, it may overlap with the CDS (Figure 1A). UORFs are latent in mRNA transcripts and may undergo translation. An initial survey of the human genome identified uORFs contained in ∼10% of mRNA transcripts [1]. More recent analyses identify uORFs in association with nearly half of all mRNA transcripts [2]. The discovery that many translated uORFs utilize near-cognate start codons to the canonical ATG start codon has broadened estimates of uORF prevalence further [3,4,5,6,7]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.