Background There is limited knowledge regarding African genetic drivers of disease due to prohibitive costs of large-scale genomic research in Africa. Methods We piloted a cost-effective, scalable virtual genotyped cohort in South Africa, with participant recruitment using a tiered informed consent model and DNA collection by buccal swab. Genotype data was generated using the H3Africa Illumina micro-array, and phenotype data was derived from routine health data of participants. We demonstrated feasibility of nested case control genome wide association studies using these data for phenotypes type 2 diabetes mellitus (T2DM) and severe COVID-19. Results 2267346 variants were analysed in 459 participant samples. 78.6% of SNPs and 74% of samples passed quality control (QC). Principal component analysis showed extensive ancestry admixture in study participants. For 1780 published COVID-19-associated variants, 3 SNPs in the pre-imputation data and 23 SNPS in the imputed data were significantly associated with severe COVID-19 cases compared to controls. For 2755 published T2DM associated variants, 69 SNPs in the pre-imputation data and 419 SNPs in the imputed data were significantly associated with T2DM cases when compared to controls. Conclusions The results shown here are illustrative of what will be possible as the cohort expands in the future. Here we demonstrate the feasibility of this approach, recognising that the findings presented here are preliminary and require further validation once we have a sufficient sample size to improve statistical significance of findings. We implemented a genotyped population cohort with virtual follow up data in a resource-constrained African environment, demonstrating feasibility for scale up and novel health discoveries through nested case-control studies.
Read full abstract