Abstract

BackgroundPrevious studies exploring sequence variation in the model legume, Medicago truncatula, relied on mapping short reads to a single reference. However, read-mapping approaches are inadequate to examine large, diverse gene families or to probe variation in repeat-rich or highly divergent genome regions. De novo sequencing and assembly of M. truncatula genomes enables near-comprehensive discovery of structural variants (SVs), analysis of rapidly evolving gene families, and ultimately, construction of a pan-genome.ResultsGenome-wide synteny based on 15 de novo M. truncatula assemblies effectively detected different types of SVs indicating that as much as 22% of the genome is involved in large structural changes, altogether affecting 28% of gene models. A total of 63 million base pairs (Mbp) of novel sequence was discovered, expanding the reference genome space for Medicago by 16%. Pan-genome analysis revealed that 42% (180 Mbp) of genomic sequences is missing in one or more accession, while examination of de novo annotated genes identified 67% (50,700) of all ortholog groups as dispensable – estimates comparable to recent studies in rice, maize and soybean. Rapidly evolving gene families typically associated with biotic interactions and stress response were found to be enriched in the accession-specific gene pool. The nucleotide-binding site leucine-rich repeat (NBS-LRR) family, in particular, harbors the highest level of nucleotide diversity, large effect single nucleotide change, protein diversity, and presence/absence variation. However, the leucine-rich repeat (LRR) and heat shock gene families are disproportionately affected by large effect single nucleotide changes and even higher levels of copy number variation.ConclusionsAnalysis of multiple M. truncatula genomes illustrates the value of de novo assemblies to discover and describe structural variation, something that is often under-estimated when using read-mapping approaches. Comparisons among the de novo assemblies also indicate that different large gene families differ in the architecture of their structural variation.

Highlights

  • Previous studies exploring sequence variation in the model legume, Medicago truncatula, relied on mapping short reads to a single reference

  • De novo assemblies have scaffold N50s > 250 kb, capturing > 90% of the M. truncatula gene space Fifteen M. truncatula accessions were sequenced with Illumina HiSeq2000 using a combination of short and long insert paired-end libraries to an average of 120-fold coverage, assembled using ALLPATHS-LG [31] (Additional files 1 and 2: Figure S1 and Table S1)

  • Between 80 and 94% of each genome could be assembled into scaffolds >100 kbp, with scaffold N50s ranging from 268 kbp to 1,653 kbp and contig N50 sizes averaging around 20 kbp (Additional file 2: Table S2)

Read more

Summary

Introduction

Previous studies exploring sequence variation in the model legume, Medicago truncatula, relied on mapping short reads to a single reference. Read-mapping approaches are inadequate to examine large, diverse gene families or to probe variation in repeat-rich or highly divergent genome regions. Large gene families play a crucial role in both biotic interactions and abiotic response. Some of these families are encoded by hundreds of members [10,11,12] organized in clusters of varying size and thought to evolve through gene duplication and birth-and-death processes [13,14,15,16,17]. NCR genes are abundant, diverse, and frequently clustered [23, 24]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.