Abstract
Pseudogenes (Ψs), including processed and non-processed Ψs, are ubiquitous genetic elements derived from originally functional genes in all studied genomes within the three kingdoms of life. However, systematic surveys of non-processed Ψs utilizing genomic information from multiple samples within a species are still rare. Here a systematic comparative analysis was conducted of Ψs within 80 fully re-sequenced Arabidopsis thaliana accessions, and 7546 genes, representing ∼28% of the genomic annotated open reading frames (ORFs), were found with disruptive mutations in at least one accession. The distribution of these Ψs on chromosomes showed a significantly negative correlation between Ψs/ORFs and their local gene densities, suggesting a higher proportion of Ψs in gene desert regions, e.g. near centromeres. On the other hand, compared with the non-Ψ loci, even the intact coding sequences (CDSs) in the Ψ loci were found to have shorter CDS length, fewer exon number and lower GC content. In addition, a significant functional bias against the null hypothesis was detected in the Ψs mainly involved in responses to environmental stimuli and biotic stress as reported, suggesting that they are likely important for adaptive evolution to rapidly changing environments by pseudogenization to accumulate successive mutations.
Highlights
Pseudogenes (Ys) are found in all studied genomes within the three kingdoms of life
We addressed the following questions: (i) What are the dynamics of pseudogenization from functional genes in or between their populations within a species? (ii) What is the distribution of Ys over the whole genome, and are there regional effects on these Ys? (iii) Is there a functional preference for the pseudonization of genes on the whole genome scale? (iv) Does natural selection play an important role in generating these Ys? To address these questions, we utilized the high-quality, fully re-sequenced data from 80 A. thaliana accessions reported by Cao et al [17]
Once an interval larger than 300 bp between the first and last indel was found in any set, this mutation was added to the frameshift mutation, as it would largely affect the translation of the coding sequence (CDS)
Summary
Pseudogenes (Ys) are found in all studied genomes within the three kingdoms of life. They are ubiquitous genetic elements derived from originally functional genes after mutational inactivation, such as premature stops or frameshift mutations [1]. The processed genes are randomly distributed in the Arabidopsis genome and tend to have originated from genes with high copy numbers but not from highly expressed genes. Evolutionary and expression analyses suggest that a large number of Ys in Arabidopsis and rice genomes had been subjected to purifying selection for substantial periods of time before pseudogenization, and that gene families involved in environmental stress responses have a significant excess of Ys [10]
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.