Whole-genome sequencing (WGS) is fast evolving into a population genetics tool to estimate effect of sequence variants on human health and fitness. However, predicting the role of variants that confer protection remains challenging. Given this, we hypothesized that use of large population datasets and integrative-omics data could be used to delineate the role of protective variants in specific genes involved in Ebola viral pathogenesis. As a pilot study we performed trio-based WGS on 3060 ethnically diverse individuals derived from a study in which infants (and both parents) undergo WGS, as well as other omic analyses, at birth, and are then followed prospectively. Variants in genes (eg, NPC1, CTSB, VPS11, VPS33A, etc.) known to play key roles in Ebola viral entry into human cells were evaluated for overall variant burden and loss-of-function via protein truncation. An additional gene (NPC2) with overlapping biochemical role but no clear role in Ebola viral entry pathway was used as control. Additionally, RNASeq was performed on a subset of individuals. We screened WGS data for variants with predicted large functional effects in the gene set. We identified 69 novel variants (not in dbSNP v.144), of which 7 were missense and 4 protein truncating in NPC1 in heterozygous state, compared to 3 rare missense variants in control gene NPC2. Furthermore, we identified heterozygous protein truncating mutations in genes within the Ebola viral entry pathway (eg, CTSB = 7; VPS11 = 3; VPS41 = 3) compared to only 1 heterozygous variant in NPC1. Based on NPC1 variation spectrum in our cohort, and corresponding to results from previous biological models, we predicted that 0.11%–0.37% of our diverse population would have resistance to Ebola virus. RNASeq analysis of heterozygous NPC1 truncation variants showed skew in the allelic- distribution. We conclude that whole genome and transcriptome data can serve as a tool for predictive analysis of viral susceptibility and resistance.
Read full abstract