Abstract

MotivationLong read sequencing and Bionano Genomics optical maps are two techniques that, when used together, make it possible to reconstruct entire chromosome or chromosome arms structure. However, the existing tools are often too conservative and organization of contigs into scaffolds is not always optimal.ResultsWe developed BiSCoT (Bionano SCaffolding COrrection Tool), a tool that post-processes files generated during a Bionano scaffolding in order to produce an assembly of greater contiguity and quality. BiSCoT was tested on a human genome and four publicly available plant genomes sequenced with Nanopore long reads and improved significantly the contiguity and quality of the assemblies. BiSCoT generates a fasta file of the assembly as well as an AGP file which describes the new organization of the input assembly.AvailabilityBiSCoT and improved assemblies are freely available on GitHub at http://www.genoscope.cns.fr/biscot and Pypi at https://pypi.org/project/biscot/.

Highlights

  • Assembling large and repetitive genomes, such as plant genomes, is a challenging field in bioinformatics.The appearance of short reads technologies several years ago improved considerably the number of genomes publicly available

  • In 2017, Bionano Genomics launched its Saphyr system which was able to generate optical maps of a genome, by using the distribution of enzymatic labelling sites. These maps were used to orient and order contigs into scaffolds but the real improvement came in 2018, when Bionano Genomics introduced their Direct Label and Stain (DLS) technology that was able to produce genome maps at the chromosome-level with a N50 several times higher than previously (Belser et al (2018); Formenti et al (2018); Hu et al (2019))

  • In order to estimate the accuracy of gap sizes, we compared the gap sizes we introduced in the input assembly to the ones that were estimated using optical maps (Supplementary Figure 1)

Read more

Summary

Introduction

Assembling large and repetitive genomes, such as plant genomes, is a challenging field in bioinformatics.The appearance of short reads technologies several years ago improved considerably the number of genomes publicly available. Even if the assemblies were greatly improved, the chromosome-level organization of the sequenced genome cannot be deciphered in a majority of cases. In 2017, Bionano Genomics launched its Saphyr system which was able to generate optical maps of a genome, by using the distribution of enzymatic labelling sites. These maps were used to orient and order contigs into scaffolds but the real improvement came in 2018, when Bionano Genomics introduced their Direct Label and Stain (DLS) technology that was able to produce genome maps at the chromosome-level with a N50 several times higher than previously (Belser et al (2018); Formenti et al (2018); Hu et al (2019))

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call