Abstract

Key messageWe propose to use the natural variation between individuals of a population for genome assembly scaffolding. In today’s genome projects, multiple accessions get sequenced, leading to variant catalogs. Using such information to improve genome assemblies is attractive both cost-wise as well as scientifically, because the value of an assembly increases with its contiguity. We conclude that haplotype information is a valuable resource to group and order contigs toward the generation of pseudomolecules.Quinoa (Chenopodium quinoa) has been under cultivation in Latin America for more than 7500 years. Recently, quinoa has gained increasing attention due to its stress resistance and its nutritional value. We generated a novel quinoa genome assembly for the Bolivian accession CHEN125 using PacBio long-read sequencing data (assembly size 1.32 Gbp, initial N50 size 608 kbp). Next, we re-sequenced 50 quinoa accessions from Peru and Bolivia. This set of accessions differed at 4.4 million single-nucleotide variant (SNV) positions compared to CHEN125 (1.4 million SNV positions on average per accession). We show how to exploit variation in accessions that are distantly related to establish a genome-wide ordered set of contigs for guided scaffolding of a reference assembly. The method is based on detecting shared haplotypes and their expected continuity throughout the genome (i.e., the effect of linkage disequilibrium), as an extension of what is expected in mapping populations where only a few haplotypes are present. We test the approach using Arabidopsis thaliana data from different populations. After applying the method on our CHEN125 quinoa assembly we validated the results with mate-pairs, genetic markers, and another quinoa assembly originating from a Chilean cultivar. We show consistency between these information sources and the haplotype-based relations as determined by us and obtain an improved assembly with an N50 size of 1079 kbp and ordered contig groups of up to 39.7 Mbp. We conclude that haplotype information in distantly related individuals of the same species is a valuable resource to group and order contigs according to their adjacency in the genome toward the generation of pseudomolecules.

Highlights

  • Quinoa (Chenopodium quinoa Willd.) is a crop plant that has been under cultivation in Latin America for more than 7500 years (Jellen et al 2011; Lack and Fuentes 2011)

  • Genome assembly scaffolding guided by haplotype information

  • The change from one pattern into another one is expected to happen through longer genomic distances so that sequences of a fragmented genome assembly used as reference may show variation patterns that continue across assembly gaps

Read more

Summary

Introduction

Quinoa (Chenopodium quinoa Willd.) is a crop plant that has been under cultivation in Latin America for more than 7500 years (Jellen et al 2011; Lack and Fuentes 2011). Quinoa is grown in an area ranging from Colombia to Chile, as well as in parts of North America, France and other countries. The nutritious seeds of quinoa are free of gluten, making them an interesting alternative to cereals, especially in the context of celiac disease. The plants are hardy and can be grown on poor soil of high salinity; they are resistant against drought and temperature fluctuation. The remarkable properties of quinoa have made the Food and Agriculture Organization (FAO) to declare 2013 as the “International Year of Quinoa.”. The remarkable properties of quinoa have made the Food and Agriculture Organization (FAO) to declare 2013 as the “International Year of Quinoa.” Quinoa has been

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call