Abstract

BackgroundDe Brujin graphs are widely used in bioinformatics for processing next-generation sequencing data. Due to a very large size of NGS datasets, it is essential to represent de Bruijn graphs compactly, and several approaches to this problem have been proposed recently.ResultsIn this work, we show how to reduce the memory required by the data structure of Chikhi and Rizk (WABI’12) that represents de Brujin graphs using Bloom filters. Our method requires 30% to 40% less memory with respect to their method, with insignificant impact on construction time. At the same time, our experiments showed a better query time compared to the method of Chikhi and Rizk.ConclusionThe proposed data structure constitutes, to our knowledge, currently the most efficient practical representation of de Bruijn graphs.

Highlights

  • De Brujin graphs are widely used in bioinformatics for processing next-generation sequencing data

  • Many computational tools dealing with next-generation sequencing (NGS) data, especially those devoted to genome assembly, are based on the concept of a de Bruijn graph, see e.g. [1]

  • Nodes of a de Bruijn grapha correspond to all distinct k-mers occurring in the given set of reads, and two k-mers are linked by an arc if they have a suffix-prefix overlap of size k − 1

Read more

Summary

Results

We show how to reduce the memory required by the data structure of Chikhi and Rizk (WABI’12) that represents de Brujin graphs using Bloom filters. Our method requires 30% to 40% less memory with respect to their method, with insignificant impact on construction time. Our experiments showed a better query time compared to the method of Chikhi and Rizk

Background
Experimental results
Method
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call