Exploring Pandora's Box: Potential and Pitfalls of Low Coverage Genome Surveys for Evolutionary Biology

Florian Leese,Ralph Tollrian,Philipp Brand,Kathrin P Lampert,Christoph D Schubart,Sebastian Striewski,Lars Dietz,Jana S Doemel,Nicole T Rivera,Andrey Rozenberg,Chester J Sands,Jennifer Nolzen,Christoph Held,Christoph Mayer,Johannes Dambach,Katrin Linse,Jennifer A Jackson,Shobhit Agrawal,William P Goodall-Copstake,Michael J Raupach,Jan Niklas Macher

doi:10.1371/journal.pone.0049202

Abstract

High throughput sequencing technologies are revolutionizing genetic research. With this “rise of the machines”, genomic sequences can be obtained even for unknown genomes within a short time and for reasonable costs. This has enabled evolutionary biologists studying genetically unexplored species to identify molecular markers or genomic regions of interest (e.g. micro- and minisatellites, mitochondrial and nuclear genes) by sequencing only a fraction of the genome. However, when using such datasets from non-model species, it is possible that DNA from non-target contaminant species such as bacteria, viruses, fungi, or other eukaryotic organisms may complicate the interpretation of the results. In this study we analysed 14 genomic pyrosequencing libraries of aquatic non-model taxa from four major evolutionary lineages. We quantified the amount of suitable micro- and minisatellites, mitochondrial genomes, known nuclear genes and transposable elements and searched for contamination from various sources using bioinformatic approaches. Our results show that in all sequence libraries with estimated coverage of about 0.02–25%, many appropriate micro- and minisatellites, mitochondrial gene sequences and nuclear genes from different KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways could be identified and characterized. These can serve as markers for phylogenetic and population genetic analyses. A central finding of our study is that several genomic libraries suffered from different biases owing to non-target DNA or mobile elements. In particular, viruses, bacteria or eukaryote endosymbionts contributed significantly (up to 10%) to some of the libraries analysed. If not identified as such, genetic markers developed from high-throughput sequencing data for non-model organisms may bias evolutionary studies or fail completely in experimental tests. In conclusion, our study demonstrates the enormous potential of low-coverage genome survey sequences and suggests bioinformatic analysis workflows. The results also advise a more sophisticated filtering for problematic sequences and non-target genome sequences prior to developing markers.

Highlights

Recent advances in high throughput sequencing technologies have caused a paradigm shift in molecular evolutionary biology [1]
The basic principle common to both is that the genomic regions identified for marker development and analysis should be informative enough to answer the biological question under study
Average read lengths after quality clipping ranged from 211.5 bp to 376.6 bp for the genomic library of the coral Favia fragum

Summary

Introduction

Recent advances in high throughput sequencing technologies have caused a paradigm shift in molecular evolutionary biology [1]. Whereas traditionally the analysis of many markers was a costly and tedious task and restricted mainly to genetic model organisms, it is possible to screen large proportions of previously unexplored genomes with high-throughput sequencing methods almost as as known genomes. This hugely facilitates ecological and evolutionary studies [2] and promises to overcome the statistical pitfalls associated with still often-used single marker studies (see [3] for discussion). With high-throughput sequencing, the straightforward sequencing of enriched and non-enriched libraries on fractions of 454 plates can provide a good solution when searching for microsatellite markers [13,14,15,16] (for a review see [4,17,18])

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS ONE	Publication Date: Nov 21, 2012
Citations: 96	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Exploring Pandora's Box: Potential and Pitfalls of Low Coverage Genome Surveys for Evolutionary Biology

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

CXCL10 and its related key genes as potential biomarkers for psoriasis: Evidence from bioinformatics and real-time quantitative polymerase chain reaction.
Ailing Zou ... Qichao Jian
Medicine | VOL. 100
Ailing Zou, et. al.Ailing Zou ... Qichao Jian
24 Sep 2021
Medicine | VOL. 100

Regulation of mRNA and miRNA in the response to Salmonella enterica serovar Enteritidis infection in chicken cecum
Xiuxiu Miao ... Xianyao Li
BMC veterinary research | VOL. 18
Xiuxiu Miao, et. al.Xiuxiu Miao ... Xianyao Li
14 Dec 2022
BMC veterinary research | VOL. 18

Looking forwards or looking backwards in avian phylogeography? A comment on Zink and Barrowclough 2008
Scott Edwards ... Staffan Bensch
Molecular Ecology | VOL. 18
Scott Edwards, et. al.Scott Edwards ... Staffan Bensch
29 Jun 2009
Looking forwards or looking backwards in avian phylogeography? A comment on Zink and Barrowclough 2008
Scott Edwards ... Staffan Bensch

Knowledge-Based Analysis of Protein Interaction Networks in Neurodegenerative Diseases
Mao Tanabe ... Vachiranee Limviphuvadh
-
Mao Tanabe, et. al.Mao Tanabe ... Vachiranee Limviphuvadh
26 Oct 2009
26 Oct 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploring Pandora's Box: Potential and Pitfalls of Low Coverage Genome Surveys for Evolutionary Biology

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE