Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach

Qingxi Meng,Shubham Chandak,Yifan Zhu,Tsachy Weissman

doi:10.1038/s41598-023-29267-8

Abstract

The amount of data produced by genome sequencing experiments has been growing rapidly over the past several years, making compression important for efficient storage, transfer and analysis of the data. In recent years, nanopore sequencing technologies have seen increasing adoption since they are portable, real-time and provide long reads. However, there has been limited progress on compression of nanopore sequencing reads obtained in FASTQ files since most existing tools are either general-purpose or specialized for short read data. We present NanoSpring, a reference-free compressor for nanopore sequencing reads, relying on an approximate assembly approach. We evaluate NanoSpring on a variety of datasets including bacterial, metagenomic, plant, animal, and human whole genome data. For recently basecalled high quality nanopore datasets, NanoSpring, which focuses only on the base sequences in the FASTQ file, uses just 0.35–0.65 bits per base which is 3–6times lower than general purpose compressors like gzip. NanoSpring is competitive in compression ratio and compression resource usage with the state-of-the-art tool CoLoRd while being significantly faster at decompression when using multiple threads (> 4times faster decompression with 20 threads). NanoSpring is available on GitHub at https://github.com/qm2/NanoSpring.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific reports	Publication Date: Feb 6, 2023
Citations: 4	License type: open-access

R Discovery Prime

R Discovery Prime

Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach

Abstract

Talk to us

Similar Papers

More From: Scientific reports

Lead the way for us

Similar Papers

Evaluation of full-length nanopore 16S sequencing for detection of pathogens in microbial keratitis.
Liying Low ... Amanda E Rossiter
PeerJ | VOL. 9
Liying Low, et. al.Liying Low ... Amanda E Rossiter
15 Feb 2021
PeerJ | VOL. 9

ModPhred: an integrative toolkit for the analysis and storage of nanopore sequencing DNA and RNA modification data.
Leszek P Pryszcz ... Eva Maria Novoa
Bioinformatics | VOL. 38
Leszek P Pryszcz, et. al.Leszek P Pryszcz ... Eva Maria Novoa
22 Jul 2021
Bioinformatics | VOL. 38

Nanopore Sequencing Technology: A Reliable Method for Pathogen Diagnosis in Elderly Patients with Community-Acquired Pneumonia.
Xiyue Zhang ... Hualiang Jin
Infection and drug resistance | VOL. 17
Xiyue Zhang, et. al.Xiyue Zhang ... Hualiang Jin
01 Jan 2024
Infection and drug resistance | VOL. 17

Advances in Nanopore Sequencing Technology
Yongqiang Yang ... Ruoyu Liu
Journal of Nanoscience and Nanotechnology | VOL. 13
Yongqiang Yang, et. al.Yongqiang Yang ... Ruoyu Liu
01 Jul 2013
Journal of Nanoscience and Nanotechnology | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach

Abstract

Talk to us

Similar Papers

More From: Scientific reports