Abstract

Currently, one of the fastest-growing DNA sequencing technologies is nanopore sequencing. One of the key stages involved in processing sequencer data is the basecalling process, where the input sequence of currents measured on the nanopores of the sequencer reproduces the DNA sequences, called DNA reads. Many of the applications dedicated to basecalling, together with the DNA sequence, provide the estimated quality of the reconstruction of a given nucleotide (quality symbols are contained on every fourth line of the FASTQ file; each nucleotide in the FASTQ file corresponds to exactly one estimated nucleotide reconstruction quality symbol). Herein, we compare the estimated nucleotide reconstruction quality symbols (signs from every fourth line of the FASTQ file) reported by other basecallers. The conducted experiments consisted of basecalling the same raw datasets from the nanopore device by other basecallers and comparing the provided quality symbols, denoting the estimated quality of the nucleotide reconstruction. The results show that the estimated quality reported by different basecallers may vary, depending on the tool used, particularly in terms of range and distribution. Moreover, we mapped basecalled DNA reads to reference genomes and calculated matched and mismatched rates for groups of nucleotides with the same quality symbol. Finally, the presented paper shows that the estimated nucleotide reconstruction quality reported in the basecalling process is not used in any investigated tool for processing nanopore DNA reads.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call