CDKAM: a taxonomic classification tool using discriminative k-mers and approximate matching strategies

Van-Kien Bui,Chaochun Wei

doi:10.1186/s12859-020-03777-y

Van-Kien Bui, Chaochun Wei

Open Access

https://doi.org/10.1186/s12859-020-03777-y

Copy DOI

Journal: BMC bioinformatics	Publication Date: Oct 20, 2020
Citations: 12	License type: open-access

Affiliation: Shanghai Jiao Tong University

Abstract

BackgroundCurrent taxonomic classification tools use exact string matching algorithms that are effective to tackle the data from the next generation sequencing technology. However, the unique error patterns in the third generation sequencing (TGS) technologies could reduce the accuracy of these programs.ResultsWe developed a Classification tool using Discriminative K-mers and Approximate Matching algorithm (CDKAM). This approximate matching method was used for searching k-mers, which included two phases, a quick mapping phase and a dynamic programming phase. Simulated datasets as well as real TGS datasets have been tested to compare the performance of CDKAM with existing methods. We showed that CDKAM performed better in many aspects, especially when classifying TGS data with average length 1000–1500 bases.ConclusionsCDKAM is an effective program with higher accuracy and lower memory requirement for TGS metagenome sequence classification. It produces a high species-level accuracy.

Highlights

Current taxonomic classification tools use exact string matching algorithms that are effective to tackle the data from the generation sequencing technology
We present CDKAM, a new taxonomic classification tool for third generation sequencing (TGS) sequencing data with high error rate
The results show that CDKAM can classify TGS sequences to their source genomes accurately and efficiently

Summary

Introduction

Current taxonomic classification tools use exact string matching algorithms that are effective to tackle the data from the generation sequencing technology. The unique error patterns in the third generation sequencing (TGS) technologies could reduce the accuracy of these programs. Results: We developed a Classification tool using Discriminative K-mers and Approxi‐ mate Matching algorithm (CDKAM). This approximate matching method was used for searching k-mers, which included two phases, a quick mapping phase and a dynamic programming phase. Conclusions: CDKAM is an effective program with higher accuracy and lower memory requirement for TGS metagenome sequence classification. As the database from NCBI is continuously growing and being more complete, we have to consider the trade-off between the size of the reference database and the classification accuracy as well as the computational cost.

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

CDKAM: a taxonomic classification tool using discriminative k-mers and approximate matching strategies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics

Lead the way for us

Similar Papers

Next Generation Sequencing Technologies and Their Applications
Ku Chee‐Seng ... Pawitan Yudi
-
Ku Chee‐Seng, et. al.Ku Chee‐Seng ... Pawitan Yudi
19 Apr 2010
19 Apr 2010

Long-read sequencing in ecology and evolution: Understanding how complex genetic and epigenetic variants shape biodiversity.
Dan G Bock ... Polina Novikova
Molecular Ecology | VOL. 32
Dan G Bock, et. al.Dan G Bock ... Polina Novikova
01 Mar 2023
Molecular Ecology | VOL. 32

Structural variation and its potential impact on genome instability: Novel discoveries in the EGFR landscape by long-read sequencing.
George W Cook ... Primo Baybayan
PLOS ONE | VOL. 15
George W Cook, et. al.George W Cook ... Primo Baybayan
15 Jan 2020
PLOS ONE | VOL. 15

#201 : Comparison of the Sensitivity of Detecting Cervical Bacteria with Next Generation Sequencing and Third Generation Sequencing Technologies
Wei-Fang Chang ... Jason Yen-Ping Ho
Fertility & Reproduction | VOL. 05
Wei-Fang Chang, et. al.Wei-Fang Chang ... Jason Yen-Ping Ho
01 Dec 2023
Fertility & Reproduction | VOL. 05

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CDKAM: a taxonomic classification tool using discriminative k-mers and approximate matching strategies

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC bioinformatics