Parallel-META: efficient metagenomic data analysis based on high-performance computation

Xiaoquan Su,Jian Xu,Kang Ning

doi:10.1186/1752-0509-6-s1-s16

Abstract

BackgroundMetagenomics method directly sequences and analyses genome information from microbial communities. There are usually more than hundreds of genomes from different microbial species in the same community, and the main computational tasks for metagenomic data analyses include taxonomical and functional component examination of all genomes in the microbial community. Metagenomic data analysis is both data- and computation- intensive, which requires extensive computational power. Most of the current metagenomic data analysis softwares were designed to be used on a single computer or single computer clusters, which could not match with the fast increasing number of large metagenomic projects' computational requirements. Therefore, advanced computational methods and pipelines have to be developed to cope with such need for efficient analyses.ResultIn this paper, we proposed Parallel-META, a GPU- and multi-core-CPU-based open-source pipeline for metagenomic data analysis, which enabled the efficient and parallel analysis of multiple metagenomic datasets and the visualization of the results for multiple samples. In Parallel-META, the similarity-based database search was parallelized based on GPU computing and multi-core CPU computing optimization. Experiments have shown that Parallel-META has at least 15 times speed-up compared to traditional metagenomic data analysis method, with the same accuracy of the results http://www.computationalbioenergy.org/parallel-meta.html.ConclusionThe parallel processing of current metagenomic data would be very promising: with current speed up of 15 times and above, binning would not be a very time-consuming process any more. Therefore, some deeper analysis of the metagenomic data, such as the comparison of different samples, would be feasible in the pipeline, and some of these functionalities have been included into the Parallel-META pipeline.

Highlights

Metagenomics method directly sequences and analyses genome information from microbial communities
Traditional metagenomic data analyses were conducted on single PC or CPU cluster, based on which handling multiple large metagenomic datasets is becoming more and more difficult
We have tried to utilize GPU computing and multi-core CPU computing to boost the speed of metagenomic data analysis, and proposed a novel pipeline that enabled the parallel processing of large metagenomic datasets

Summary

Introduction

Metagenomics method directly sequences and analyses genome information from microbial communities. More than 99% of microbe species were unknown and un-cultivable [2], making traditional isolation and cultivation process non-applicable Analysis of their metagenomic data is the direct and efficient way to analyse all microbes in the community [3]. 16S rRNA-based metagenomic survey of microbial communities focused on 16S ribosomal RNA sequences which are relatively short, often conserved within a species, and different between species. The16S rRNA-based metagenomic survey has already produced data for analysis of microbial communities of Sargasso Sea [4], acid mine drainage biofilm [5] and human gut microbiome [6]. We were focusing on data analysis for shot-gun whole-genome metagenomic sequencing, in which computational methods play very important roles, especially the similarity-based database search

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Systems Biology	Publication Date: Jul 1, 2012
Citations: 60	License type: cc-by

R Discovery Prime

R Discovery Prime

Parallel-META: efficient metagenomic data analysis based on high-performance computation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Systems Biology

Lead the way for us

Similar Papers

Parallel-META: A high-performance computational pipeline for metagenomic data analysis
Xiaoquan Su ... Kang Ning
-
Xiaoquan Su, et. al. Xiaoquan Su ... Kang Ning
01 Sep 2011
01 Sep 2011

MetaWRAP\u2014a flexible pipeline for genome-resolved metagenomic data analysis
Gherman V Uritskiy ... James Taylor
Microbiome | VOL. 6
Gherman V Uritskiy, et. al.Gherman V Uritskiy ... James Taylor
15 Sep 2018
MetaWRAP\u2014a flexible pipeline for genome-resolved metagenomic data analysis
Gherman V Uritskiy ... James Taylor

An Open-source Collaboration Environment for Metagenomics Research
Xiaoquan Su ... Hongwei Yang
-
Xiaoquan Su, et. al.Xiaoquan Su ... Hongwei Yang
01 Dec 2011
01 Dec 2011

CloudLCA: finding the lowest common ancestor in metagenome analysis using cloud computing
Guoguang Zhao ... Yi Zhao
Protein & Cell | VOL. 3
Guoguang Zhao, et. al.Guoguang Zhao ... Yi Zhao
01 Feb 2012
Protein & Cell | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Parallel-META: efficient metagenomic data analysis based on high-performance computation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Systems Biology