Tackling soil diversity with the assembly of large, complex metagenomes

Adina Chuang Howe,C Titus Brown,James M Tiedje,Janet K Jansson,Stephanie A Malfatti,Susannah G Tringe

doi:10.1073/pnas.1402564111

Abstract

The large volumes of sequencing data required to sample deeply the microbial communities of complex environments pose new challenges to sequence analysis. De novo metagenomic assembly effectively reduces the total amount of data to be analyzed but requires substantial computational resources. We combine two preassembly filtering approaches--digital normalization and partitioning--to generate previously intractable large metagenome assemblies. Using a human-gut mock community dataset, we demonstrate that these methods result in assemblies nearly identical to assemblies from unprocessed data. We then assemble two large soil metagenomes totaling 398 billion bp (equivalent to 88,000 Escherichia coli genomes) from matched Iowa corn and native prairie soils. The resulting assembled contigs could be used to identify molecular interactions and reaction networks of known metabolic pathways using the Kyoto Encyclopedia of Genes and Genomes Orthology database. Nonetheless, more than 60% of predicted proteins in assemblies could not be annotated against known databases. Many of these unknown proteins were abundant in both corn and prairie soils, highlighting the benefits of assembly for the discovery and characterization of novelty in soil biodiversity. Moreover, 80% of the sequencing data could not be assembled because of low coverage, suggesting that considerably more sequencing data are needed to characterize the functional content of soil.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Tackling soil diversity with the assembly of large, complex metagenomes

Abstract

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences

Lead the way for us

Journal: Proceedings of the National Academy of Sciences	Publication Date: Mar 14, 2014
Citations: 295

Similar Papers

Answering biological questions by querying k‐mer databases
Paul Greenfield ... Uwe Roehm
Concurrency and Computation: Practice and Experience | VOL. 25
Paul Greenfield, et. al.Paul Greenfield ... Uwe Roehm
11 Oct 2012
Concurrency and Computation: Practice and Experience | VOL. 25

WheatExp: an RNA-seq expression database for polyploid wheat
Stephen Pearce ... Jorge Dubcovsky
BMC Plant Biology | VOL. 15
Stephen Pearce, et. al.Stephen Pearce ... Jorge Dubcovsky
01 Dec 2015
BMC Plant Biology | VOL. 15

MirLibSpark
Chao-Jung Wu ... Abdoulaye Baniré Diallo
-
Chao-Jung Wu, et. al.Chao-Jung Wu ... Abdoulaye Baniré Diallo
04 Sep 2019
04 Sep 2019

Genome Sequence Databases: Annotation
A Bhattacharyya
-
A BhattacharyyaA Bhattacharyya
01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Tackling soil diversity with the assembly of large, complex metagenomes

Abstract

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences