Evaluation of whole exome sequencing as an alternative to BeadChip and whole genome sequencing in human population genetic analysis

Zoltán Maróti,Michael Snyder,Tibor Kalmár,Zsolt Boldogkői,Dóra Tombácz

doi:10.1186/s12864-018-5168-x

Zoltán Maróti, Michael Snyder + Show 3 more

Open Access

https://doi.org/10.1186/s12864-018-5168-x

Copy DOI

Journal: BMC Genomics	Publication Date: Oct 29, 2018
Citations: 10	License type: open-access

Affiliation: University of Szeged, Stanford University

Abstract

BackgroundUnderstanding the underlying genetic structure of human populations is of fundamental interest to both biological and social sciences. Advances in high-throughput genotyping technology have markedly improved our understanding of global patterns of human genetic variation. The most widely used methods for collecting variant information at the DNA-level include whole genome sequencing, which remains costly, and the more economical solution of array-based techniques, as these are capable of simultaneously genotyping a pre-selected set of variable DNA sites in the human genome. The largest publicly accessible set of human genomic sequence data available today originates from exome sequencing that comprises around 1.2% of the whole genome (approximately 30 million base pairs).ResultsTo unbiasedly compare the effect of SNP selection strategies in population genetic analysis we subsampled the variants of the same highly curated 1 K Genome dataset to mimic genome, exome sequencing and array data in order to eliminate the effect of different chemistry and error profiles of these different approaches. Next we compared the application of the exome dataset to the array-based dataset and to the gold standard whole genome dataset using the same population genetic analysis methods.ConclusionsOur results draw attention to some of the inherent problems that arise from using pre-selected SNP sets for population genetic analysis. Additionally, we demonstrate that exome sequencing provides a better alternative to the array-based methods for population genetic analysis. In this study, we propose a strategy for unbiased variant collection from exome data and offer a bioinformatics protocol for proper data processing.

Highlights

Understanding the underlying genetic structure of human populations is of fundamental interest to both biological and social sciences
Since exome data by definition contains high portion of the functional variants that are under selection pressure, in this study, we explored whether this could lead to any bias in population genetic analysis
Because there is no publicly available whole genome sequencing (WGS) data from modern Hungarians and since our results indicate that the use of exome data is suitable for population genetic analysis, we carried out population genetic analysis of modern Hungarians based on their exome data (HUN EXOME dataset)

Summary

Introduction

Understanding the underlying genetic structure of human populations is of fundamental interest to both biological and social sciences. While case-control design studies can be an efficacious strategy for identifying candidate genes in complex diseases in a population, in diversely admixed populations (e.g. Latin Americans, with admixture of American Indians, Europeans and Africans) population stratification can affect association studies and thereby could lead to false genetic associations [4] This undesirable distortion can be minimized by genotyping AIMs. application of the Whole Exome Sequencing (WES) method had spread and gained popularity, as WES is cost effective for routine genetic diagnosis of rare inherited diseases, and extensive databases have been generated containing thousands of publicly accessible exomes (Exome Aggregation Consortium ~ 61000 exomes [5], Exome Variant Server ~ 6500 exomes [6]). Since exome data by definition contains high portion of the functional variants that are under selection pressure, in this study, we explored whether this could lead to any bias in population genetic analysis

Methods

Results

Conclusion