Abstract
BackgroundThe cost of Whole Genome Sequencing (WGS) has decreased tremendously in recent years due to advances in next-generation sequencing technologies. Nevertheless, the cost of carrying out large-scale cohort studies using WGS is still daunting. Past simulation studies with coverage at ~2x have shown promise for using low coverage WGS in studies focused on variant discovery, association study replications, and population genomics characterization. However, the performance of low coverage WGS in populations with a complex history and no reference panel remains to be determined.ResultsSouth Indian populations are known to have a complex population structure and are an example of a major population group that lacks adequate reference panels. To test the performance of extremely low-coverage WGS (EXL-WGS) in populations with a complex history and to provide a reference resource for South Indian populations, we performed EXL-WGS on 185 South Indian individuals from eight populations to ~1.6x coverage. Using two variant discovery pipelines, SNPTools and GATK, we generated a consensus call set that has ~90% sensitivity for identifying common variants (minor allele frequency ≥ 10%). Imputation further improves the sensitivity of our call set. In addition, we obtained high-coverage for the whole mitochondrial genome to infer the maternal lineage evolutionary history of the Indian samples.ConclusionsOverall, we demonstrate that EXL-WGS with imputation can be a valuable study design for variant discovery with a dramatically lower cost than standard WGS, even in populations with a complex history and without available reference data. In addition, the South Indian EXL-WGS data generated in this study will provide a valuable resource for future Indian genomic studies.
Highlights
The cost of Whole Genome Sequencing (WGS) has decreased tremendously in recent years due to advances in next-generation sequencing technologies
As a proof-of-principle, we present the results from extremely low coverage whole genome sequencing (EXL-WGS) of eight South Asian populations from a wide spectrum of social and cultural strata living in the state of Andhra Pradesh
Using extremely low-coverage WGS (EXL-WGS) of 185 samples with coverages between 1x and 2x, we demonstrate that the EXL-WGS study design generates accurate genomic variant information and reliably recapitulates population substructure generated by previous methods
Summary
The cost of Whole Genome Sequencing (WGS) has decreased tremendously in recent years due to advances in next-generation sequencing technologies. We aim to continue these developments and hypothesize that a study design of sequencing at population scale (e.g., more than a few hundred subjects) with each subject at 1– 2x coverage (i.e., extremely low coverage) would capture sufficient information to understand population genomic attributes such as diversity, population substructure, and admixture. Such a study design would decrease the cost for a population genomics study to tens of thousands of dollars. Population-level WGS surveys would provide additional information for these populations
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.