Chromosome-scale Scaffolds Research Articles

Atriplex hortensis (2n = 2x = 18, 1C genome size ∼1.1 gigabases), also known as garden orach and mountain-spinach, is a highly nutritious, broadleaf annual of the Amaranthaceae-Chenopodiaceae alliance (Chenopodiaceae sensu stricto, subfam. Chenopodioideae) that has spread in cultivation from its native primary domestication area in Eurasia to other temperate and subtropical regions worldwide. Atriplex L. is a highly complex but, as understood now, a monophyletic group of mainly halophytic and/or xerophytic plants, of which A. hortensis has been a vegetable of minor importance in some areas of Eurasia (from Central Asia to the Mediterranean) at least since antiquity. Nonetheless, it is a crop with tremendous nutritional potential due primarily to its exceptional leaf and seed protein quantities (approaching 30%) and quality (high levels of lysine). Although there is some literature describing the taxonomy and production of A. hortensis, there is a general lack of genetic and genomic data that would otherwise help elucidate the genetic variation, phylogenetic positioning, and future potential of the species. Here, we report the assembly of the first high-quality, chromosome-scale reference genome for A. hortensis cv. “Golden.” Long-read data from Oxford Nanopore’s MinION DNA sequencer was assembled with the program Canu and polished with Illumina short reads. Contigs were scaffolded to chromosome scale using chromatin-proximity maps (Hi-C) yielding a final assembly containing 1,325 scaffolds with a N50 of 98.9 Mb – with 94.7% of the assembly represented in the nine largest, chromosome-scale scaffolds. Sixty-six percent of the genome was classified as highly repetitive DNA, with the most common repetitive elements being Gypsy-(32%) and Copia-like (11%) long-terminal repeats. The annotation was completed using MAKER which identified 37,083 gene models and 2,555 tRNA genes. Completeness of the genome, assessed using the Benchmarking Universal Single Copy Orthologs (BUSCO) metric, identified 97.5% of the conserved orthologs as complete, with only 2.2% being duplicated, reflecting the diploid nature of A. hortensis. A resequencing panel of 21 wild, unimproved and cultivated A. hortensis accessions revealed three distinct populations with little variation within subpopulations. These resources provide vital information to better understand A. hortensis and facilitate future study.

BackgroundThe long-range sequencing information captured by linked reads, such as those available from 10× Genomics (10xG), helps resolve genome sequence repeats, and yields accurate and contiguous draft genome assemblies. We introduce ARKS, an alignment-free linked read genome scaffolding methodology that uses linked reads to organize genome assemblies further into contiguous drafts. Our approach departs from other read alignment-dependent linked read scaffolders, including our own (ARCS), and uses a kmer-based mapping approach. The kmer mapping strategy has several advantages over read alignment methods, including better usability and faster processing, as it precludes the need for input sequence formatting and draft sequence assembly indexing. The reliance on kmers instead of read alignments for pairing sequences relaxes the workflow requirements, and drastically reduces the run time.ResultsHere, we show how linked reads, when used in conjunction with Hi-C data for scaffolding, improve a draft human genome assembly of PacBio long-read data five-fold (baseline vs. ARKS NG50 = 4.6 vs. 23.1 Mbp, respectively). We also demonstrate how the method provides further improvements of a megabase-scale Supernova human genome assembly (NG50 = 14.74 Mbp vs. 25.94 Mbp before and after ARKS), which itself exclusively uses linked read data for assembly, with an execution speed six to nine times faster than competitive linked read scaffolders (~ 10.5 h compared to 75.7 h, on average). Following ARKS scaffolding of a human genome 10xG Supernova assembly (of cell line NA12878), fewer than 9 scaffolds cover each chromosome, except the largest (chromosome 1, n = 13).ConclusionsARKS uses a kmer mapping strategy instead of linked read alignments to record and associate the barcode information needed to order and orient draft assembly sequences. The simplified workflow, when compared to that of our initial implementation, ARCS, markedly improves run time performances on experimental human genome datasets. Furthermore, the novel distance estimator in ARKS utilizes barcoding information from linked reads to estimate gap sizes. It accomplishes this by modeling the relationship between known distances of a region within contigs and calculating associated Jaccard indices. ARKS has the potential to provide correct, chromosome-scale genome assemblies, promptly. We expect ARKS to have broad utility in helping refine draft genomes.

Chromosome-scale Scaffolds Research Articles

Related Topics

Articles published on Chromosome-scale Scaffolds

Symbiodinium microadriaticum (coral microalgal endosymbiont)

Genetic and spatial organization of the unusual chromosomes of the dinoflagellate Symbiodinium microadriaticum

Chromosome-level genome assemblies of the malaria vectors Anopheles coluzzii and Anopheles arabiensis.

A Reference Genome Sequence for Giant Sequoia.

Chromosome-scale scaffolds for the Chinese hamster reference genome assembly to facilitate the study of the CHO epigenome.

A Chromosome-Scale Assembly of the Garden Orach (Atriplex hortensis L.) Genome Using Oxford Nanopore Sequencing.

Gene clustering and copy number variation in alkaloid metabolic pathways of opium poppy

A chromosome-scale draft sequence of the Canada fleabane genome.

Discovery of a New TLR Gene and Gene Expansion Event through Improved Desert Tortoise Genome Assembly with Chromosome-Scale Scaffolds.

Chromosomal-level assembly of Takifugu obscurus (Abe, 1949) genome using third-generation DNA sequencing and Hi-C analysis.

Integrating Hi-C links with assembly graphs for chromosome-scale assembly.

A chromosome-scale assembly of the major African malaria vector Anopheles funestus.

ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers

From Short Reads to Chromosome-Scale Genome Assemblies.

Hybrid de novo genome assembly and centromere characterization of the gray mouse lemur (Microcebus murinus)

Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Chromosome-scale Scaffolds Research Articles

Related Topics

Articles published on Chromosome-scale Scaffolds

Symbiodinium microadriaticum (coral microalgal endosymbiont)

Genetic and spatial organization of the unusual chromosomes of the dinoflagellate Symbiodinium microadriaticum

Chromosome-level genome assemblies of the malaria vectors Anopheles coluzzii and Anopheles arabiensis.

A Reference Genome Sequence for Giant Sequoia.

Chromosome-scale scaffolds for the Chinese hamster reference genome assembly to facilitate the study of the CHO epigenome.

A Chromosome-Scale Assembly of the Garden Orach (Atriplex hortensis L.) Genome Using Oxford Nanopore Sequencing.

Gene clustering and copy number variation in alkaloid metabolic pathways of opium poppy

A chromosome-scale draft sequence of the Canada fleabane genome.

Discovery of a New TLR Gene and Gene Expansion Event through Improved Desert Tortoise Genome Assembly with Chromosome-Scale Scaffolds.

Chromosomal-level assembly of Takifugu obscurus (Abe, 1949) genome using third-generation DNA sequencing and Hi-C analysis.

Integrating Hi-C links with assembly graphs for chromosome-scale assembly.

A chromosome-scale assembly of the major African malaria vector Anopheles funestus.

ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers

From Short Reads to Chromosome-Scale Genome Assemblies.

Hybrid de novo genome assembly and centromere characterization of the gray mouse lemur (Microcebus murinus)

Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions.