Abstract

The International Genome Sample Resource (IGSR) repository was established to maximise the utility of human genetic data derived from openly consented samples within the research community. Here we describe variant detection in 505 samples from four populations in The Gambia, using the GRCh38 reference genome, adding to the range of populations for which this has been done and, importantly, making allele frequencies available. A multi-caller site discovery process was applied along with imputation and phasing to produce a phased biallelic single nucleotide variant (SNV) and insertion/deletion (INDEL) call set. Variation had not previously been explored on the GRCh38 human genome assembly for 387 of the samples. Compared to our previous work with the 1000 Genomes Project data on GRCh38, we identified over nine million novel SNVs and over 870 thousand novel INDELs.

Highlights

  • The 1000 Genomes Project collected samples from self-declared healthy individuals across numerous specific populations, organised into five continental super-populations with the aim of cataloguing common human genetic variation (1000 Genomes Project Consortium et al, 2015)

  • Cell line creation was successful, and they were included in the 1000 Genomes Project, forming the “Gambian in Western Division, The Gambia - Mandinka” population, and assigned the three-letter code GWD

  • We describe the use of this pipeline to produce a phased single nucleotide variant (SNV) and insertion/deletion (INDEL) call set for both the Gambian Genome Variation Project (GGVP) samples and the 118 GWD samples

Read more

Summary

Introduction

The 1000 Genomes Project collected samples from self-declared healthy individuals across numerous specific populations, organised into five continental super-populations with the aim of cataloguing common human genetic variation (1000 Genomes Project Consortium et al, 2015). 100 samples were collected from each of four populations in The Gambia. A subsequent set of over 100 additional samples were collected from one of the original four populations, the Mandinka. For these samples, cell line creation was successful, and they were included in the 1000 Genomes Project, forming the “Gambian in Western Division, The Gambia - Mandinka” population, and assigned the three-letter code GWD

Objectives
Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call