Abstract
The accepted model of eukaryotic translation initiation begins with the scanning of the transcript by the pre-initiation complex from the 5′end until an ATG codon with a specific nucleotide (nt) context surrounding it is recognized (Kozak rule). According to this model, ATG codons upstream to the beginning of the ORF should affect translation. We perform for the first time, a genome-wide statistical analysis, uncovering a new, more comprehensive and quantitative, set of initiation rules for improving the cost of translation and its efficiency. Analyzing dozens of eukaryotic genomes, we find that in all frames there is a universal trend of selection for low numbers of ATG codons; specifically, 16–27 codons upstream, but also 5–11 codons downstream of the START ATG, include less ATG codons than expected. We further suggest that there is selection for anti optimal ATG contexts in the vicinity of the START ATG. Thus, the efficiency and fidelity of translation initiation is encoded in the 5′UTR as required by the scanning model, but also at the beginning of the ORF.The observed nt patterns suggest that in all the analyzed organisms the pre-initiation complex often misses the START ATG of the ORF, and may start translation from an alternative initiation start-site. Thus, to prevent the translation of undesired proteins, there is selection for nucleotide sequences with low affinity to the pre-initiation complex near the beginning of the ORF. With the new suggested rules we were able to obtain a twice higher correlation with ribosomal density and protein levels in comparison to the Kozak rule alone (e.g. for protein levels r = 0.7 vs. r = 0.31; p<10−12).
Highlights
Gene translation is the central cellular process of sequence decoding to produce a protein
According to the accepted scanning model [10,11,12,13,14], this complex accompanied by additional initiation factors scan the mRNA sequence starting from its 59 end towards its 39 end, until a start codon is recognized, which represents the beginning of the open reading frame (ORF)
We show for the first time that there is selection for less ATG codons downstream to the beginning of the ORF; we estimate the length of the region under such selection upstream (59UTR) and downstream the beginning of the ORF; we report some additional sequence signals related to initiation fidelity such as anti-‘Kozak’ sequences surrounding ATG codons near the beginning of the ORF, and the appearance of stop codons close to them, that are under selection; and we are the first to quantify the partage of the protein levels and ribosomal density variance that can be explained by the different signals near the beginning of the ORF
Summary
Gene translation is the central cellular process of sequence decoding to produce a protein. This process occurs in every organism and consumes most of the cellular energy [1,2,3], it has important ramifications to every biomedical field [3,4,5,6,7,8,9]. According to the accepted scanning model [10,11,12,13,14], this complex accompanied by additional initiation factors scan the mRNA sequence starting from its 59 end towards its 39 end, until a start codon is recognized (usually an AUG that is identified by the initiation tRNA), which represents the beginning of the open reading frame (ORF). ATG codons are expected to be present in all possible reading frames upstream and downstream the START of the ORF; how does the scanning pre-initiation complex recognize the start ATG?
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.