Abstract
Protein-coding genes in eukaryotes are interrupted by introns, but intron densities widely differ between eukaryotic lineages. Vertebrates, some invertebrates and green plants have intron-rich genes, with 6–7 introns per kilobase of coding sequence, whereas most of the other eukaryotes have intron-poor genes. We reconstructed the history of intron gain and loss using a probabilistic Markov model (Markov Chain Monte Carlo, MCMC) on 245 orthologous genes from 99 genomes representing the three of the five supergroups of eukaryotes for which multiple genome sequences are available. Intron-rich ancestors are confidently reconstructed for each major group, with 53 to 74% of the human intron density inferred with 95% confidence for the Last Eukaryotic Common Ancestor (LECA). The results of the MCMC reconstruction are compared with the reconstructions obtained using Maximum Likelihood (ML) and Dollo parsimony methods. An excellent agreement between the MCMC and ML inferences is demonstrated whereas Dollo parsimony introduces a noticeable bias in the estimations, typically yielding lower ancestral intron densities than MCMC and ML. Evolution of eukaryotic genes was dominated by intron loss, with substantial gain only at the bases of several major branches including plants and animals. The highest intron density, 120 to 130% of the human value, is inferred for the last common ancestor of animals. The reconstruction shows that the entire line of descent from LECA to mammals was intron-rich, a state conducive to the evolution of alternative splicing.
Highlights
Spliceosomal introns that interrupt most of the protein-coding genes and the concurrent splicing machinery that mediates intron excision and exon splicing are defining features of gene architecture and expression in eukaryotes [1,2]
The results clearly show that ancestral eukaryote forms were intron-rich, with Last Eukaryotic Common Ancestor (LECA) having a high intron density, on the order of two-thirds of the introns density in human genes
The results of this work, thanks to the extensive data set of analyzed genomes and the robust reconstruction method that yields inferences of ancestral states with minimal uncertainty, seem to close the debate on the gene architecture of ancestors of extant eukaryotes including LECA
Summary
Spliceosomal introns that interrupt most of the protein-coding genes and the concurrent splicing machinery that mediates intron excision and exon splicing are defining features of gene architecture and expression in eukaryotes [1,2]. There are many reports on the contribution of introns to the regulation of gene expression [9,10], and in vertebrates introns encode a variety of non-coding RNAs with established or predicted regulatory functions [11]. It remains unclear how general such functional roles of introns are. In addition to these specific functions, numerous introns are essential for alternative splicing which involves the great majority of genes in multicellular eukaryotes and is one of the principal mechanisms of proteome diversification [12,13,14]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.