Abstract
Despite their importance in determining protein abundance, a comprehensive catalogue of sequence features controlling protein‐to‐mRNA (PTR) ratios and a quantification of their effects are still lacking. Here, we quantified PTR ratios for 11,575 proteins across 29 human tissues using matched transcriptomes and proteomes. We estimated by regression the contribution of known sequence determinants of protein synthesis and degradation in addition to 45 mRNA and 3 protein sequence motifs that we found by association testing. While PTR ratios span more than 2 orders of magnitude, our integrative model predicts PTR ratios at a median precision of 3.2‐fold. A reporter assay provided functional support for two novel UTR motifs, and an immobilized mRNA affinity competition‐binding assay identified motif‐specific bound proteins for one motif. Moreover, our integrative model led to a new metric of codon optimality that captures the effects of codon frequency on protein synthesis and degradation. Altogether, this study shows that a large fraction of PTR ratio variation in human tissues can be predicted from sequence, and it identifies many new candidate post‐transcriptional regulatory elements.
Highlights
Unraveling how gene regulation is encoded in genomes is central to delineating gene regulatory programs and to understanding predispositions to diseases
We modeled every gene with a single transcript isoform because there was little evidence for widespread expression of multiple isoforms and to avoid practical difficulties of calling and quantifying isoform abundance consistently at mRNA and protein levels
The model includes sequence features that we identified de novo through systematic association testing between either median PTR ratios across tissues or tissue-specific PTR ratio fold-changes relative to the median, and the presence of k-mers, i.e., subsequences of a predefined length k, in the 50 UTR, the coding sequence, the 30 UTR, and the protein sequence (Materials and Methods)
Summary
Unraveling how gene regulation is encoded in genomes is central to delineating gene regulatory programs and to understanding predispositions to diseases. Transcript abundance is a major determinant of protein abundance, substantial deviations between mRNA and protein levels of gene expression exist (Liu et al, 2016). These deviations include a much larger dynamic range of protein abundances (Garcıa-Martınez et al, 2007; Lackner et al, 2007; Schwanh€ausser et al, 2011; Wilhelm et al, 2014; Csardi et al, 2015) and poor mRNA–protein correlations for important gene classes across cell types and tissues (Fortelny et al, 2017; Franks et al, 2017). Decades of single-gene studies have revealed numerous sequence elements affecting initiation, elongation, and termination of translation as well as protein degradation. The sequence context of the start codon plays a major role in start codon recognition (Kozak, 1986)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have