A Massively Parallel Sequence Similarity Search for Metagenomic Sequencing Data.

Masanori Kakuta,Kazuki Izawa,Takashi Ishida,Yutaka Akiyama,Shuji Suzuki

doi:10.3390/ijms18102124

Masanori Kakuta, Kazuki Izawa + Show 3 more

Open Access

https://doi.org/10.3390/ijms18102124

Copy DOI

Abstract

Sequence similarity searches have been widely used in the analyses of metagenomic sequencing data. Finding homologous sequences in a reference database enables the estimation of taxonomic and functional characteristics of each query sequence. Because current metagenomic sequencing data consist of a large number of nucleotide sequences, the time required for sequence similarity searches account for a large proportion of the total time. This time-consuming step makes it difficult to perform large-scale analyses. To analyze large-scale metagenomic data, such as those found in the human oral microbiome, we developed GHOST-MP (Genome-wide HOmology Search Tool on Massively Parallel system), a parallel sequence similarity search tool for massively parallel computing systems. This tool uses a fast search algorithm based on suffix arrays of query and database sequences and a hierarchical parallel search to accelerate the large-scale sequence similarity search of metagenomic sequencing data. The parallel computing efficiency and the search speed of this tool were evaluated. GHOST-MP was shown to be scalable over 10,000 CPU (Central Processing Unit) cores, and achieved over 80-fold acceleration compared with mpiBLAST using the same computational resources. We applied this tool to human oral metagenomic data, and the results indicate that the oral cavity, the oral vestibule, and plaque have different characteristics based on the functional gene category.

Highlights

Most microbes are difficult to isolate and cultivate [1]
We developed a new massively parallel sequence similarity search tool for large-scale metagenomic sequencing data, such as the human oral microbiome
GHOST-MP achieved faster sequence similarity searches than mpiBLAST, enabling large-scale functional analyses to be performed within a short period of time

Summary

Introduction

Most microbes are difficult to isolate and cultivate [1]. The metagenomic approach with direct sequencing of microbial genomes from environmental samples is a culture-independent way to identify uncultured microbes. The mpiBLAST software searches in parallel using multiple processes on a distributed memory system with thousands of CPU cores to reduce the search time Both approaches accelerate the similarity search process, the acceleration of only one approach is insufficient for large-scale analyses. We developed a new massively parallel sequence similarity search tool for large-scale metagenomic sequencing data, such as the human oral microbiome. The system consists of a parallel sequence similarity search on a massively parallel distributed memory system, named GHOST-MP This enables the analysis of large-scale metagenomic data consisting of hundreds of sets of environmental sequencing data. GHOST-MP achieved faster sequence similarity searches than mpiBLAST, enabling large-scale functional analyses to be performed within a short period of time. The GHOST-MP program is implemented in C++, and is available under the BSD (Berkeley Software Distribution) License from http://www.bi.cs.titech.ac.jp/ghostmp/

Evaluation of Scalability and Search Speed

Sequence Data

Functional Gene Analysis Pipeline

Computing Environments

Sequence Similarity Search with Indexes Based on Suffix Arrays

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International Journal of Molecular Sciences	Publication Date: Oct 11, 2017
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Massively Parallel Sequence Similarity Search for Metagenomic Sequencing Data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Molecular Sciences

Lead the way for us

Similar Papers

Mining, analyzing, and integrating viral signals from metagenomic data
Tingting Zheng ... Kang Kang
Microbiome | VOL. 7
Tingting Zheng, et. al.Tingting Zheng ... Kang Kang
19 Mar 2019
Microbiome | VOL. 7

A Model-Based Approach For Species Abundance Quantification Based On Shotgun Metagenomic Data.
Eric Z Chen ... Hongzhe Li
Statistics in Biosciences | VOL. 9
Eric Z Chen, et. al.Eric Z Chen ... Hongzhe Li
01 Jun 2017
Statistics in Biosciences | VOL. 9

MetaWRAP\u2014a flexible pipeline for genome-resolved metagenomic data analysis
Gherman V Uritskiy ... James Taylor
Microbiome | VOL. 6
Gherman V Uritskiy, et. al.Gherman V Uritskiy ... James Taylor
15 Sep 2018
MetaWRAP\u2014a flexible pipeline for genome-resolved metagenomic data analysis
Gherman V Uritskiy ... James Taylor

TreeSeq, a Fast and Intuitive Tool for Analysis of Whole Genome and Metagenomic Sequence Data
Bastiaan B Wintermans ... Andries E Budding
PLOS ONE | VOL. 10
Bastiaan B Wintermans, et. al.Bastiaan B Wintermans ... Andries E Budding
01 May 2015
PLOS ONE | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Massively Parallel Sequence Similarity Search for Metagenomic Sequencing Data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Molecular Sciences