Abstract

Continuing advances in nucleotide sequencing technology are inspiring a suite of genomic approaches in studies of natural populations. Researchers are faced with data management and analytical scales that are increasing by orders of magnitude. With such dramatic advances comes a need to understand biases and error rates, which can be propagated and magnified in large-scale data acquisition and processing. Here we assess genomic sampling biases and the effects of various population-level data filtering strategies in a genotyping-by-sequencing (GBS) protocol. We focus on data from two species of Populus, because this genus has a relatively small genome and is emerging as a target for population genomic studies. We estimate the proportions and patterns of genomic sampling by examining the Populus trichocarpa genome (Nisqually-1), and demonstrate a pronounced bias towards coding regions when using the methylation-sensitive ApeKI restriction enzyme in this species. Using population-level data from a closely related species (P. tremuloides), we also investigate various approaches for filtering GBS data to retain high-depth, informative SNPs that can be used for population genetic analyses. We find a data filter that includes the designation of ambiguous alleles resulted in metrics of population structure and Hardy-Weinberg equilibrium that were most consistent with previous studies of the same populations based on other genetic markers. Analyses of the filtered data (27,910 SNPs) also resulted in patterns of heterozygosity and population structure similar to a previous study using microsatellites. Our application demonstrates that technically and analytically simple approaches can readily be developed for population genomics of natural populations.

Highlights

  • Genetic studies of natural populations have traditionally relied on low numbers of loci to make inferences about evolutionary and demographic processes at various temporal and spatial scales

  • Several versions of next generation sequencing (NGS) applications are becoming available for the study of population genomics, along with a variety of approaches to analyzing the resulting large-scale data

  • We apply a technically simple NGS approach (GBS) to samples from natural populations, present an assessment of how the genome of a related species is sampled with this technique, and explore an analytical framework that provides simple genotype data that are ready for traditional population genetic analyses

Read more

Summary

Introduction

Genetic studies of natural populations have traditionally relied on low numbers of loci to make inferences about evolutionary and demographic processes at various temporal and spatial scales. The ability to sample genomes densely at the population level is leading to a rapid radiation of analytical and bioinformatic approaches for population genomics. These approaches will greatly increase our understanding of evolutionary, demographic, and adaptive mechanisms operating in populations, and how these processes vary across the entire genome. The scales of these datasets and analyses present new challenges, including various types of chemical and technical biases, sequencing errors, and genotyping errors, all of which can be inadvertently propagated and magnified through data handling pipelines. The characterization and appropriate treatment of these biases and error sources is a critical aspect of these emerging approaches to population genomics

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call