Abstract

Functional rare variants in drug-related genes are believed to be highly differentiated between ethnic- or racial populations. However, knowledge of population differentiation (PD) of rare single-nucleotide variants (SNVs), remains widely lacking, with the highest fixation indices, (Fst values), from both rare and common variants annotated to specific genes, having only been marginally used to understand PD at the gene level. In this study, we suggest a new, gene-based PD method, PD of Rare and Common variants (PDRC), for analyzing rare variants, as inspired by Generalized Cochran-Mantel-Haenszel (GCMH) statistics, to identify highly population-differentiated drug response-related genes (“pharmacogenes”). Through simulation studies, we reveal that PDRC adequately summarizes rare and common variants, due to PD, over a specific gene. We also applied the proposed method to a real whole-exome sequencing dataset, consisting of 10,000 datasets, from the Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples (T2D-GENES) initiative, and 3,000 datasets from the Genetics of Type 2 diabetes (Go-T2D) repository. Among the 48 genes annotated with Very Important Pharmacogenetic summaries (VIPgenes), in the PharmGKB database, our PD method successfully identified candidate genes with high PD, including ACE, CYP2B6, DPYD, F5, MTHFR, and SCN5A.

Highlights

  • Rare variants with large effect sizes have been predicted to exist in the human genome[1, 2]; the large effect sizes of these variants have been observed, using real data analysis, but without analysis of population differentiation (PD)[3, 4]

  • We compare our PD of Rare and Common variants (PDRC) method to previously used PD determination algorithms, validating its superior performance at determining PD of numerous single-nucleotide variants (SNVs), as related to real whole-exome sequencing (WES) of 10,000 datasets, from the Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples (T2D-GENES) initiative, and 3,000 WES datasets from the Genetics of Type 2 diabetes (Go-T2D) repository

  • We first procured whole exome sequencing (WES) datasets, the first consisting of 10,000 datasets, from the Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples (T2D-GENES) initiative, and the second consisting of 3,000 datasets from the Genetics of Type 2 diabetes (Go-T2D) repository[33]

Read more

Summary

Introduction

Rare variants with large effect sizes have been predicted to exist in the human genome[1, 2]; the large effect sizes of these variants have been observed, using real data analysis, but without analysis of population differentiation (PD)[3, 4]. Since previous various established methods (e.g., XP-EHH, and iHS) mainly focus on identifying haplotypes, adjacent common variants can strongly affect the results, and severely limit the identification of loci in alleles with intermediate frequency[9, 10] In those methods, many rare variants in datasets significantly affect their performance by increasing the number of switch errors in the phased haplotypes[11,12,13]. The method recently proposed by Berg and Coop[14] was mainly for detecting correlations between genetic values and specific environmental variables As results, these methods are inappropriate for our primary objective, i.e., finding genes with a high level of PD resulting from natural selection, in very recent evolutionary history (based on sequenced data with a large number of rare variants). We compare our PDRC method to previously used PD determination algorithms, validating its superior performance at determining PD of numerous SNVs, as related to real whole-exome sequencing (WES) of 10,000 datasets, from the Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples (T2D-GENES) initiative, and 3,000 WES datasets from the Genetics of Type 2 diabetes (Go-T2D) repository

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call