Abstract

BackgroundFollowing whole genome duplication (WGD), there is a compact distribution of gene similarities within the genome reflecting duplicate pairs of all the genes in the genome. With time, the distribution broadens and loses volume due to variable decay of duplicate gene similarity and to the process of duplicate gene loss. If there are two WGD, the older one becomes so reduced and broad that it merges with the tail of the distributions resulting from more recent events, and it becomes difficult to distinguish them. The goal of this paper is to advance statistical methods of identifying, or at least counting, the WGD events in the lineage of a given genome.MethodsFor a set of 15 angiosperm genomes, we analyze all 15 × 14 = 210 ordered pairs of target genome versus reference genome, using SynMap to find syntenic blocks. We consider all sets of B ≥ 2 syntenic blocks in the target genome that overlap in the reference genome as evidence of WGD activity in the target, whether it be one event or several. We hypothesize that in fitting an exponential function to the tail of the empirical distribution f (B) of block multiplicities, the size of the exponent will reflect the amount of WGD in the history of the target genome.ResultsBy amalgamating the results from all reference genomes, a range of values of SynMap parameters, and alternative cutoff points for the tail, we find a clear pattern whereby multiple-WGD core eudicots have the smallest (negative) exponents, followed by core eudicots with only the single "γ" triplication in their history, followed by a non-core eudicot with a single WGD, followed by the monocots, with a basal angiosperm, the WGD-free Amborella having the largest exponent.ConclusionThe hypothesis that the exponent of the fit to the tail of the multiplicity distribution is a signature of the amount of WGD is verified, but there is also a clear complicating factor in the monocot clade, where a history of multiple WGD is not reflected in a small exponent.

Highlights

  • Following whole genome duplication (WGD), there is a compact distribution of gene similarities within the genome reflecting duplicate pairs of all the genes in the genome

  • Immediately after a whole genome duplication (WGD), and for a time that is short on the evolutionary timescale, the distribution of gene similarities within the genome shows a sharp peak near 100 %, containing duplicate pairs of all the genes in the genome

  • It is clear that the parameter c reflects the degree of WGD activity in the history of a genome

Read more

Summary

Introduction

Following whole genome duplication (WGD), there is a compact distribution of gene similarities within the genome reflecting duplicate pairs of all the genes in the genome. If there are two WGD, the older one becomes so reduced and broad that it merges with the tail of the distributions resulting from more recent events, and it becomes difficult to distinguish them. After a whole genome duplication (WGD), and for a time that is short on the evolutionary timescale, the distribution of gene similarities within the genome shows a sharp peak near 100 %, containing duplicate pairs of all the genes in the genome. With the passage of time, If there are two or more WGD (or higher order polyploidizations), the older peaks become so reduced and broad that they merge with the tails of the distributions resulting from more recent events, and it becomes difficult to distinguish them. The relationship between the number of blocks in W making up a superblock - its multiplicity, is not strictly determined by its WGD history, because of random attrition of blocks due to fractionation, disruptions due to chromosomal rearrangement, and other processes

Objectives
Methods
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call