Abstract

Existing sequencing error correction techniques demand large expensive memory space. In this work, we introduce a new disk-based sequencing error correction method to solve the problem. The key idea is to utilize a special on-disk index structure, called the BoND-tree, to store and access a large set of k-mers and their associated metadata on disk. With the BoND-tree, a set of special box queries to retrieve the relevant k-mers and their counts are efficiently processed. A comprehensive voting mechanism is adopted to determine and correct an erroneous base in a genome sequence. Experiments demonstrate that the proposed method is quite promising in verifying and correcting sequencing errors in terms of accuracy and scalability.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.