CLAST: CUDA implemented large-scale alignment search tool.

Masahiro Yano,Hiroshi Mori,Takuji Yamada,Ken Kurokawa,Yutaka Akiyama

doi:10.1186/s12859-014-0406-y

Masahiro Yano, Hiroshi Mori + Show 3 more

Open Access

https://doi.org/10.1186/s12859-014-0406-y

Copy DOI

Abstract

BackgroundMetagenomics is a powerful methodology to study microbial communities, but it is highly dependent on nucleotide sequence similarity searching against sequence databases. Metagenomic analyses with next-generation sequencing technologies produce enormous numbers of reads from microbial communities, and many reads are derived from microbes whose genomes have not yet been sequenced, limiting the usefulness of existing sequence similarity search tools. Therefore, there is a clear need for a sequence similarity search tool that can rapidly detect weak similarity in large datasets.ResultsWe developed a tool, which we named CLAST (CUDA implemented large-scale alignment search tool), that enables analyses of millions of reads and thousands of reference genome sequences, and runs on NVIDIA Fermi architecture graphics processing units. CLAST has four main advantages over existing alignment tools. First, CLAST was capable of identifying sequence similarities ~80.8 times faster than BLAST and 9.6 times faster than BLAT. Second, CLAST executes global alignment as the default (local alignment is also an option), enabling CLAST to assign reads to taxonomic and functional groups based on evolutionarily distant nucleotide sequences with high accuracy. Third, CLAST does not need a preprocessed sequence database like Burrows–Wheeler Transform-based tools, and this enables CLAST to incorporate large, frequently updated sequence databases. Fourth, CLAST requires <2 GB of main memory, making it possible to run CLAST on a standard desktop computer or server node.ConclusionsCLAST achieved very high speed (similar to the Burrows–Wheeler Transform-based Bowtie 2 for long reads) and sensitivity (equal to BLAST, BLAT, and FR-HIT) without the need for extensive database preprocessing or a specialized computing platform. Our results demonstrate that CLAST has the potential to be one of the most powerful and realistic approaches to analyze the massive amount of sequence data from next-generation sequencing technologies.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-014-0406-y) contains supplementary material, which is available to authorized users.

Highlights

Metagenomics is a powerful methodology to study microbial communities, but it is highly dependent on nucleotide sequence similarity searching against sequence databases
Most fundamental metagenomic analyses are highly dependent on sequence alignment tools, such as the Basic Local Alignment Search Tool (BLAST) [4], BLAST-like Alignment Tool (BLAT) [5], and Fragment Recruitment at High Identity with Tolerance (FR-HIT) algorithm [6], to search for nucleotide sequence similarity against sequence databases
These query sets were searched against the reference genome sequences using SSEARCH, BLAST, BLAT, and CLAST

Summary

Introduction

Metagenomics is a powerful methodology to study microbial communities, but it is highly dependent on nucleotide sequence similarity searching against sequence databases. Metagenomic analyses with next-generation sequencing technologies produce enormous numbers of reads from microbial communities, and many reads are derived from microbes whose genomes have not yet been sequenced, limiting the usefulness of existing sequence similarity search tools. A single run of the latest version of the Illumina sequencing system (HiSeq 2500) can produce ~540–600 Gb of sequences with 100-bp read lengths, and can take >11 days [1]. These technologies have made it easier to perform massive sequencing projects such as metagenomic analyses. The sensitivity and search speed often have contradictory requirements, and most alignment tools used for metagenomic studies sacrifice one of these aspects

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2014
Citations: 30	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

CLAST: CUDA implemented large-scale alignment search tool.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

A Massively Parallel Sequence Similarity Search for Metagenomic Sequencing Data.
Masanori Kakuta ... Shuji Suzuki
International Journal of Molecular Sciences | VOL. 18
Masanori Kakuta, et. al.Masanori Kakuta ... Shuji Suzuki
11 Oct 2017
International Journal of Molecular Sciences | VOL. 18

Integration of Alignment and Phylogeny in the Whole-Genome Era

-

18 Jun 2015
18 Jun 2015

Bioinformatics Methods and Biological Interpretation for Next-Generation Sequencing Data.
Guohua Wang ... Weixing Feng
BioMed Research International | VOL. 2015
Guohua Wang, et. al.Guohua Wang ... Weixing Feng
01 Jan 2015
BioMed Research International | VOL. 2015

Div-BLAST: diversification of sequence search results.
Elif Eser ... Tolga Can
PLoS ONE | VOL. 9
Elif Eser, et. al.Elif Eser ... Tolga Can
22 Dec 2014
PLoS ONE | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CLAST: CUDA implemented large-scale alignment search tool.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics