Abstract

BackgroundK-mer-based methods of genome analysis have attracted great interest because they do not require genome assembly and can be performed directly on sequencing reads. Many analysis tasks require one to compare k-mer lists from different sequences to find words that are either unique to a specific sequence or common to many sequences. However, no stand-alone k-mer analysis tool currently allows one to perform these algebraic set operations.FindingsWe have developed the GenomeTester4 toolkit, which contains a novel tool GListCompare for performing union, intersection and complement (difference) set operations on k-mer lists. We provide examples of how these general operations can be combined to solve a variety of biological analysis tasks.ConclusionsGenomeTester4 can be used to simplify k-mer list manipulation for many biological analysis tasks.Electronic supplementary materialThe online version of this article (doi:10.1186/s13742-015-0097-y) contains supplementary material, which is available to authorized users.

Highlights

  • K-mer-based methods of genome analysis have attracted great interest because they do not require genome assembly and can be performed directly on sequencing reads

  • One of the fastest and most widely used k-mer counting tools is Jellyfish [5], which runs on several parallel CPU threads and operates on a lock-free hash table that eliminates waiting for concurrent data access from different threads

  • KMC2 [7] and DSK [8] can run on computers with limited memory by writing k-mers into several small temporary tables that are combined onto disk storage

Read more

Summary

Introduction

K-mer-based methods of genome analysis have attracted great interest because they do not require genome assembly and can be performed directly on sequencing reads. GenomeTester4 can generate lists of k-mer counts from nucleotide sequences and perform basic algebraic set operations - union, intersection and difference (complement) - on these lists. The GListMaker routine generates k-mer count lists from nucleotide sequences, and the GListCompare tool performs basic algebraic set operations with these lists.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call