Abstract
Eukaryotic protein synthesis generally initiates at a start codon defined by an AUG and its surrounding Kozak sequence context, but the quantitative importance of this context in different species is unclear. We tested this concept in two pathogenic Cryptococcus yeast species by genome-wide mapping of translation and of mRNA 5′ and 3′ ends. We observed thousands of AUG-initiated upstream open reading frames (uORFs) that are a major contributor to translation repression. uORF use depends on the Kozak sequence context of its start codon, and uORFs with strong contexts promote nonsense-mediated mRNA decay. Transcript leaders in Cryptococcus and other fungi are substantially longer and more AUG-dense than in Saccharomyces. Numerous Cryptococcus mRNAs encode predicted dual-localized proteins, including many aminoacyl-tRNA synthetases, in which a leaky AUG start codon is followed by a strong Kozak context in-frame AUG, separated by mitochondrial-targeting sequence. Analysis of other fungal species shows that such dual-localization is also predicted to be common in the ascomycete mould, Neurospora crassa. Kozak-controlled regulation is correlated with insertions in translational initiation factors in fidelity-determining regions that contact the initiator tRNA. Thus, start codon context is a signal that quantitatively programs both the expression and the structures of proteins in diverse fungi.
Highlights
Fungi are important in the fields of ecology, medicine, and biotechnology
Delineation of transcript ends in C. neoformans and C. deneoformans
We examined sequences associated with translation start codons in other fungi, for which both RNA-Seq and riboprofiling data were available, and for which the annotation was sufficiently complete (i.e. S. cerevisiae; N. crassa, C. albicans and S. pombe)
Summary
Fungi are important in the fields of ecology, medicine, and biotechnology. With roughly 3 million predicted fungal species, this kingdom is the most diverse of the domain Eukarya [1]. Comparative analysis of coding sequences enables the generation of hypotheses on genome biology and evolution [4,5,6,7]. These analyses intrinsically depend on the quality of the coding gene identification and annotation, which have limitations. Annotation pipelines only predict plausible open reading frames (ORFs), initially for yeast a contiguous stretch of at least 100 codons starting with an AUG codon and ending with a stop codon [10].
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have