Abstract

The small-sized MinlON nanopore sequencing device provided by Oxford Nanopore Technologies (ONT) could generate ultra-long reads with a user-friendly DNA library building process, which would facilitate the studies of genomics into a new stage. However, nanopore sequencing reads currently do show a relatively high error rate, which significantly limits the application of Nanopore sequencing. In this study, we took advantage of the information from both the raw electrical signals and the basecalled sequences from the basecallers to develop a deep learning method for correcting the nanopore sequencing basecalling errors introduced by current basecallers provided by default. Specifically, we first re-segmented the raw electrical signals based on the basecalled sequences to extract the input representations. Then, the preprocessed input will pass through several convolution neural network (CNN) layers and bidirectional long short-term memory (Bi-LSTM) network layers to generate sophisticated multi-dimensional features. Moreover, we recruited center loss function in our model with the traditional SoftMax cross-entropy loss function to deal with the extremely unbalanced data. Our results showed that our post-basecalling correction method significantly improved the basecalling quality and could correct the errors in the homopolymer regions of human genome.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call