Abstract

Black raspberry (Rubus occidentalis L.) is a niche fruit crop valued for its flavor and potential health benefits. The improvement of fruit and cane characteristics via molecular breeding technologies has been hindered by the lack of a high-quality reference genome. The recently released draft genome for black raspberry (ORUS 4115-3) lacks assembly of scaffolds to chromosome scale. We used high-throughput chromatin conformation capture (Hi-C) and Proximity-Guided Assembly (PGA) to cluster and order 9650 out of 11,936 contigs of this draft genome assembly into seven pseudo-chromosomes. The seven pseudo-chromosomes cover ~97.2% of the total contig length (~223.8 Mb). Locating existing genetic markers on the physical map resolved multiple discrepancies in marker order on the genetic map. Centromeric regions were inferred from recombination frequencies of genetic markers, alignment of 303 bp centromeric sequence with the PGA, and heat map showing the physical contact matrix over the entire genome. We demonstrate a high degree of synteny between each of the seven chromosomes of black raspberry and a high-quality reference genome for strawberry (Fragaria vesca L.) assembled using only PacBio long-read sequences. We conclude that PGA is a cost-effective and rapid method of generating chromosome-scale assemblies from Illumina short-read sequencing data.

Highlights

  • Introduction Despite recent advances inDNA sequencing technologies and computational approaches, the de novo assembly of high-quality reference genomes in plant species using solely short-read sequencing data remains difficult

  • The high-throughput chromatin conformation capture (Hi-C) library was sequenced on the Illumina NextSeq platform (Illumina, La Jolla, CA, USA), generating 54.4 million HiC read pairs, which were provided as input to the Proximo Hi-C scaffolding pipeline

  • 52.2 million Hi-C read pairs were aligned to the 11,936 R. occidentalis contigs spanning 230,199,469 bp (97.3% of total sequence length)

Read more

Summary

Introduction

DNA sequencing technologies and computational approaches, the de novo assembly of high-quality reference genomes in plant species using solely short-read sequencing data remains difficult. One of the biggest challenges is scaffolding contigs into chromosome-scale sequences, as construction of short gun libraries with large inserts (>15 kb) is extremely difficult and de novo assembly algorithms tend to not perform well across repetitive sequences. Bacterial artificial chromosome (BAC)-end sequencing is one approach; it is expensive, tedious and time consuming. High-density genetic maps can be used for scaffolding; it is often difficult to obtain sufficient numbers of markers to anchor and orientate small contigs and the mapping algorithms used to build genetic maps can sometimes place markers at incorrect locations[4]. There is not a direct relationship between centiMorgan (cM) on linkage maps, which is a measure of recombination frequency, and physical distance expressed as megabases (Mb) of sequence data

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call