Abstract

Next-generation sequencing technologies create large, multiplexed DNA sequences that require preprocessing before any further analysis. Part of this preprocessing includes demultiplexing and trimming sequences. Although there are many existing tools that can handle these preprocessing steps, they cannot be easily extended to new sequence schematics when new pipelines are developed. We present Fuzzysplit, a tool that relies on a simple declarative language to describe the schematics of sequences, which makes it incredibly adaptable to different use cases. In this paper, we explain the matching algorithms behind Fuzzysplit and we provide a preliminary comparison of its performance with other well-established tools. Overall, we find that its matching accuracy is comparable to previous tools.

Highlights

  • Advances in next-generation DNA sequencing technology allow large quantities of multiplexed DNA to be sequenced

  • Many methods, including the Genotyping by Sequencing (GBS) (Elshire et al, 2011) strategy, require sequenced DNA to first undergo preprocessing before further processing and analysis

  • When demultiplexing, reads of DNA are split into different files according to the barcode matched in the DNA sequence

Read more

Summary

INTRODUCTION

Advances in next-generation DNA sequencing technology allow large quantities of multiplexed DNA to be sequenced. Fuzzysplit uses a greedy overarching algorithm that matches a list of arbitrary patterns P1...jPj from one line of the template file with its corresponding line of input text T1 : : : |T|. It partitions the list of patterns into continuous chunks of either fixed-length patterns (both fuzzy patterns and fixed-length wildcard patterns) or interval-length patterns. If valid matches are found for both the interval-length chunk and the Algorithm 1 Matching one line of input text with its corresponding patterns from the template file. The main thread may block to wait for the worker threads by using a semaphore, in order to constrain the amount of reads stored in memory at one time

RESULTS
LIMITATIONS
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call