Abstract

BackgroundWhole-genome duplications in the ancestors of many diverse species provided the genetic material for evolutionary novelty. Several models explain the retention of paralogous genes. However, how these models are reflected in the evolution of coding and non-coding sequences of paralogous genes is unknown.ResultsHere, we analyzed the coding and non-coding sequences of paralogous genes in Arabidopsis thaliana and compared these sequences with those of orthologous genes in Arabidopsis lyrata. Paralogs with lower expression than their duplicate had more nonsynonymous substitutions, were more likely to fractionate, and exhibited less similar expression patterns with their orthologs in the other species. Also, lower-expressed genes had greater tissue specificity. Orthologous conserved non-coding sequences in the promoters, introns, and 3′ untranslated regions were less abundant at lower-expressed genes compared to their higher-expressed paralogs. A gene ontology (GO) term enrichment analysis showed that paralogs with similar expression levels were enriched in GO terms related to ribosomes, whereas paralogs with different expression levels were enriched in terms associated with stress responses.ConclusionsLoss of conserved non-coding sequences in one gene of a paralogous gene pair correlates with reduced expression levels that are more tissue specific. Together with increased mutation rates in the coding sequences, this suggests that similar forces of purifying selection act on coding and non-coding sequences. We propose that coding and non-coding sequences evolve concurrently following gene duplication.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-2803-2) contains supplementary material, which is available to authorized users.

Highlights

  • Whole-genome duplications in the ancestors of many diverse species provided the genetic material for evolutionary novelty

  • We found that the genes of differentially expressed paralog pairs with reduced expression are under less purifying selection, exhibit more tissue-specific expression, and have lost orthologous coding sequences (CNSs) compared with paralogs that have equal or increased expression

  • We identified a set of 1312 highly similar paralogs resulting from the recent α-whole genome duplication (WGD) event in A. thaliana (Additional file 1: Table S2)

Read more

Summary

Introduction

Whole-genome duplications in the ancestors of many diverse species provided the genetic material for evolutionary novelty. Several models explain the retention of paralogous genes. How these models are reflected in the evolution of coding and non-coding sequences of paralogous genes is unknown. The genome reorganizes and, many duplicated sequences are deleted, a considerable proportion of duplicated genes remains as paralogs in the genome [1]. A. thaliana contains more than 2500 paralogous gene pairs, accounting for about one-sixth of all proteincoding genes in this species [1, 6]. Due to the wealth of Several models of evolution following a WGD event have been proposed, the most prominent of which are balanced gene drive [11], subfunctionalization of gene pairs [12], and neofunctionalization [9, 13] (reviewed in [14]).

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call