Abstract

Recurrent whole genome duplication and the ensuing loss of redundant genes-fractionation-complicate efforts to reconstruct the gene orders and chromosomes of the ancestors associated with the nodes of a phylogeny. Loss of genes disrupts the gene adjacencies key to current techniques. With our RACCROCHE pipeline, instead of starting with the inference of short ancestral segments, we suggest delaying the choice of gene adjacencies while we accumulate many more syntenically validated generalized (gapped) adjacencies. We obtain longer ancestral contigs using maximum weight matching (MWM). Similarly, we do not construct chromosomes by successively piecing together contigs into larger segments, but rather compile counts of pairwise contig co-occurrences on the set of extant genomes and use these to cluster the contigs. Chromosome-level contig assemblies for a monoploid genome emerge naturally at each node of the phylogeny and the contigs then can be ordered along the chromosome. Sampling alternative MWM solutions, visualizing heat maps, and applying gap statistics allow us to estimate the number of chromosomes in the reconstruction. We introduce several measures of quality: length of contigs, continuity of contig structure on successive ancestors, coverage of the extant genome by the reconstruction, and rearrangement relations among the inferred chromosomes. The reconstructed ancestors are visualized by painting the ancestral projections on the descendant genomes. We submit genomes drawn from a broad range of monocot orders to our pipeline, confirming the tetraploidization event "tau" in the stem lineage between the alismatids and the lilioids. We show additional applications to the Solanaceae and to four Brassica genomes, producing evidence about the monoploid ancestor in each case.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call