GWAS SVatalog: a visualization tool to aid fine-mapping of GWAS loci with structural variations.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Genome-wide association studies (GWAS) have been successful in identifying single nucleotide polymorphisms (SNPs) associated with phenotypic traits. However, SNPs form an incomplete set of variation across the genome and since a large percentage of GWAS-significant SNPs lie in non-coding regions, their impact on a given trait is difficult to decipher. Recognizing whether these SNPs are tagging other polymorphisms, like structural variations (SV), is an important step towards understanding the putative causal variation at GWAS loci. Here, we develop GWAS SVatalog ( https://svatalog.research.sickkids.ca/ ), a novel open-source web tool that computes and visualizes linkage disequilibrium (LD) between SVs and GWAS-associated SNPs throughout the human genome. The tool combines GWAS Catalog's SNP-trait association data across 14,479 phenotypes with LD statistics calculated between 35,732 SVs and 116,870 SNPs identified in 101 whole-genome long-read sequences. We show that different SV types are more likely to overlap regulatory features, and that SVs less directly tagged by GWAS-associated SNPs more frequently overlap CpG islands and promoters. We use GWAS SVatalog to identify SVs that may explain GWAS loci for iron levels, refractive error, and Alzheimer's disease, where previously SNPs were unable to provide a causal explanation. GWAS SVatalog advances the fine-mapping of GWAS loci with structural variations, enabling researchers to associate 35,732 common SVs with 14,479 phenotypes, accelerating the understanding of disease etiology.

Similar Papers
  • Research Article
  • Cite Count Icon 37
  • 10.1016/j.ajhg.2021.02.006
Integrative analysis of liver-specific non-coding regulatory SNPs associated with the risk of coronary artery disease
  • Feb 23, 2021
  • The American Journal of Human Genetics
  • Ilakya Selvarajan + 21 more

Integrative analysis of liver-specific non-coding regulatory SNPs associated with the risk of coronary artery disease

  • Peer Review Report
  • 10.7554/elife.69719.sa1
Decision letter: A proteome-wide genetic investigation identifies several SARS-CoV-2-exploited host targets of clinical relevance
  • Jun 28, 2021
  • John W Schoggins

Article Figures and data Abstract eLife digest Introduction Materials and methods Results Discussion Data availability References Decision letter Author response Article and author information Metrics Abstract Background: The virus SARS-CoV-2 can exploit biological vulnerabilities (e.g. host proteins) in susceptible hosts that predispose to the development of severe COVID-19. Methods: To identify host proteins that may contribute to the risk of severe COVID-19, we undertook proteome-wide genetic colocalisation tests, and polygenic (pan) and cis-Mendelian randomisation analyses leveraging publicly available protein and COVID-19 datasets. Results: Our analytic approach identified several known targets (e.g. ABO, OAS1), but also nominated new proteins such as soluble Fas (colocalisation probability >0.9, p=1 × 10-4), implicating Fas-mediated apoptosis as a potential target for COVID-19 risk. The polygenic (pan) and cis-Mendelian randomisation analyses showed consistent associations of genetically predicted ABO protein with several COVID-19 phenotypes. The ABO signal is highly pleiotropic, and a look-up of proteins associated with the ABO signal revealed that the strongest association was with soluble CD209. We demonstrated experimentally that CD209 directly interacts with the spike protein of SARS-CoV-2, suggesting a mechanism that could explain the ABO association with COVID-19. Conclusions: Our work provides a prioritised list of host targets potentially exploited by SARS-CoV-2 and is a precursor for further research on CD209 and FAS as therapeutically tractable targets for COVID-19. Funding: MAK, JSc, JH, AB, DO, MC, EMM, MG, ID were funded by Open Targets. J.Z. and T.R.G were funded by the UK Medical Research Council Integrative Epidemiology Unit (MC_UU_00011/4). JSh and GJW were funded by the Wellcome Trust Grant 206194. This research was funded in part by the Wellcome Trust [Grant 206194]. For the purpose of open access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. eLife digest Individuals who become infected with the virus that causes COVID-19 can experience a wide variety of symptoms. These can range from no symptoms or minor symptoms to severe illness and death. Key demographic factors, such as age, gender and race, are known to affect how susceptible an individual is to infection. However, molecular factors, such as unique gene mutations and gene expression levels can also have a major impact on patient responses by affecting the levels of proteins in the body. Proteins that are too abundant or too scarce may mean the difference between dying from or surviving COVID-19. Identifying the molecular factors in a host that affect how viruses can infect individuals, evade immune defences or trigger severe illness, could provide new ways to treat patients with COVID-19. Such factors are likely to remain constant, even when the virus mutates into new strains. Hence, insights would likely apply across all virus strains, including current strains, such as alpha and delta, and any new strains that may emerge in the future. Using such a 'natural experiment' approach, Karim et al. compared the genetic profiles of over 30,000 COVID-19 patients and a million healthy individuals. Nine proteins were found to have an impact on COVID-19 infection and disease severity. Four proteins were ranked as top priorities for potential treatment targets. One protein, called CD209 (also known as DC-SIGN), is involved in how the virus enters the host cells, and had one of the strongest associations with COVID-19. Two proteins, called IL-6R and FAS, were involved in the immune response and could be responsible for the immune over-activation often seen in severe COVID-19. Finally, one protein, called OAS1, formed part of the body's innate antiviral defence system and appeared to reduce susceptibility to COVID-19. Knowing more about the proteins that influence the severity of COVID-19 opens up new ways to predict, protect and treat patients who may have severe or fatal reactions to infection. Indeed, one of the identified proteins (IL-6R) had already been targeted in recent clinical trials with some encouraging results. Considering CD209 as a potential receptor for the virus could provide another avenue for therapeutics, similar to previously successful approaches to block the virus' known interaction with a receptor protein. Ultimately, this research could supply an entirely new set of treatment options to help combat the COVID-19 pandemic. Introduction At the current time, the coronavirus disease 2019 (COVID-19) pandemic is implicated in the deaths of more than 4 million people worldwide (Dong et al., 2020). Although effective vaccines have been developed to substantially reduce mortality and morbidity due to severe COVID-19, the emergence of mutated strains of the SARS-CoV-2 virus has challenged the effectiveness of existing vaccines and raised the urgency of identifying alternate therapeutic pathways to target the virus (Tegally, 2020; Erik et al., 2020 ; Collier et al., 2021). Nevertheless, it is likely that the mutated strains of SARS-CoV-2 will continue to exploit the same vulnerable host biology to bind onto and infect cells and, in susceptible individuals, evade immune defences and promote the excessive host inflammatory response that is characteristic of severe COVID-19 (Gordon et al., 2020a). Therefore, the identification of host proteins that play roles in COVID-19 susceptibility and severity remains crucial to the development of therapeutics as host protein mechanisms are independent of genomic mutations in the virus. An improved understanding of these therapeutically relevant virus-host pathways may also be important in combating viruses beyond SARS-CoV-2 (Perrin-Cocon et al., 2020). Several large-scale systematic experimental efforts have identified key host proteins that interact with viral proteins in the pathogenesis of severe COVID-19 (Gordon et al., 2020a; Gordon et al., 2020b; Bouhaddou et al., 2020). These notably include efforts to identify direct interactions with the spike protein of SARS-CoV-2, which mediates virus attachment onto receptors to infect host cells and is also the basis of most vaccines (Shang et al., 2020; Harvey et al., 2021). To complement in vitro host protein characterisation efforts, several groups have leveraged genetic datasets of human proteins and COVID-19 disease to identify therapeutically actionable candidate host proteins that are likely to play roles in enhancing COVID-19 susceptibility or to be involved in the pathogenesis of severe COVID-19 (Pairo-Castineira et al., 2021; Zhou et al., 2021). One of the approaches used was Mendelian randomisation (MR). MR simulates the design of randomised trials, with the underlying principle that randomisation of alleles at conception offers the opportunity to examine approximate differences in average risk of disease between comparable groups in a population that differ only in the distribution of the risk factor of interest (Davies et al., 2018), for example, protein abundance (Zheng et al., 2020). This allows the use of alleles as genetic instruments representing genetically predicted protein levels to proxy effects of pharmacological modulation of the protein. Some of the clinically actionable proteins identified by the MR approach are part of type I interferon signalling (encoded by genes: IFNAR2, TYK2, OAS1) and interleukin-6 (IL-6) signalling pathways (IL6R). Only one of these proteins (encoded by OAS1) had any evidence of genetic colocalisation, that is, evidence that genetic associations of the protein and COVID outcomes shared the same causal genetic signal (Zhou et al., 2021). An additional protein that was supported by both MR and genetic colocalisation tests was ABO (Zhou et al., 2021), reported in several published genome-wide association studies (GWAS) of COVID-19 (Pairo-Castineira et al., 2021; Ellinghaus et al., 2020). In response to the first published GWAS of COVID-19, we reported findings that link the ABO signal with a number of clinically actionable targets including coagulation factors (von Willebrand factor [vWF], and Factor VIII [F8]), IL-6, and CD209/DC-SIGN (Karim et al., 2020). However, in most of the previous MR studies (Pairo-Castineira et al., 2021; Zhou et al., 2021), investigators only used curated cis-acting variants (genetic variants near or in the gene encoding the relevant protein) as genetic instruments to represent effects of genetically predicted protein concentrations, rather than genome-wide instruments. While the use of cis-acting variants can minimise the risk of horizontal pleiotropic effects (i.e. associations driven by other proteins not on the causal pathway for the disease), it can suffer from lower power than a genome-wide analysis due to fewer available instruments (Zheng et al., 2020). Furthermore, in previous protein-COVID-19 MR studies, genetic colocalisation tests were carried out only for protein-phenotype associations that were significant in the MR analysis, potentially excluding many protein-phenotype associations that may share the same causal genetic signal but are underpowered in a proteome-wide MR approach. In the present study, we expanded on these previous reports by undertaking a proteome-wide two-sample pan- and cis-MR analysis using the Sun et al. GWAS (Sun et al., 2018) of plasma protein concentrations and several COVID-19 GWAS phenotypes from the ICDA COVID-19 Host Genetics Initiative (October 2020 release) (Huang et al., 2020). First, we showed that genetically predicted circulating ABO protein was associated with COVID-19 susceptibility and severity and the lead ABO signal was associated strongly with plasma concentrations of soluble CD209. Second, we collected evidence for a direct mechanism of interaction between the SARS-CoV-2 spike protein and human CD209 protein. Third, we performed proteome-wide genetic colocalisation tests, followed by single-instrument cis-MR analysis, and we report additional novel targets of therapeutic relevance. Finally, we examined associated phenotypes using the colocalising signals from the Open Targets Genetics portal (http://genetics.opentargets.org) to shed light on the biological basis of association of the proteins with the COVID-19 phenotypes. Materials and methods Key resources table Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional informationCell line (Homo sapiens)HEK293-EYves Durocher, PMID:11788735RRID:CVCL_6974Transfected construct (Homo sapiens)pCMV6-CD209OrigeneCat.# SC304915Plasmid for CD209 cDNA expression in cell-based binding assayTransfected construct (Homo sapiens)pTT3-ACE2-BLHPMID:33432067Plasmid for recombinant ACE2 extracellular domain, for plate-based assays as the immobilised formTransfected construct (Homo sapiens)pTT3-CD209-BLHThis paperPlasmid for recombinant CD209 extracellular domain for plate-based assays as the immobilised formTransfected construct (Homo sapiens)pTT3-Cd4d3+ d4AddgeneRRID:Addgene_32402Plasmid for recombinant tag control (Cd4 domains 3 and 4)Transfected construct (Homo sapiens)pTT3-SPIKE-COMP-BLacThis paperPlasmid for recombinant SARS-CoV-2 spike extracellular domain for plate-based assays as the soluble formTransfected construct (Homo sapiens)pTT3-BirA-FLAGAddgeneRRID:Addgene_64395Biotin ligase plasmid for recombinant protein biotinylationPeptide, recombinant proteinStreptavidin R-phycoerythrinBioLegendCat.# 405245For tetramer staining in cell-based binding assayChemical compound, drugDAPI (4',6-diamidino-2-phenylindole)BioLegendCat.# 4228011 μM for flow cytometry live/dead stainingChemical compound, drugD-biotinSigma-AldrichCat.# 2031100 μM supplemented to cell culture media for biotinylationSoftware, algorithmR (version 4.0.3)R Foundationwww.r-project.orgRRID:SCR_001905Analysis and generating plots Genetic associations of proteins Request a detailed protocol We primarily used Sun et al. protein GWAS data (Sun et al., 2018; Emilsson et al., 2018) for the pan-/cis-MR analyses and for performing genetic colocalisation tests (described below). The pan-/cis-MR effects were expressed per standard deviation (SD) higher genetically predicted plasma protein concentrations. Two additional proteomic datasets (Emilsson et al., 2018; Suhre et al., 2017) were used to identify proteins associated with the ABO locus. The genotyping protocols and QC of these proteomic studies have been described previously (Sun et al., 2018; Emilsson et al., 2018; Suhre et al., 2017). All three of the proteomic studies have used the SOMAscan assay platform (an aptamer-based protein detection platform) to detect and quantify protein abundance (Gold et al., 2012). Genetic associations of COVID-19 Request a detailed protocol We used seven meta-analysed COVID-19 datasets from the October 2020 release of the ICDA COVID-HGI group (https://www.covid19hg.org/results/r4/). These seven COVID-19 outcomes are A1 (very severe respiratory confirmed COVID vs. not hospitalised COVID), A2 (very severe respiratory confirmed COVID vs. population), B1 (hospitalised COVID vs. not hospitalised COVID), B2 (hospitalised COVID vs. population), C1 (COVID vs. lab/self-reported negative), C2 (COVID vs. population), and D1 (predicted COVID from self-reported symptoms vs. predicted or self-reported non-COVID). Definitions of these outcomes are provided in Supplementary file 1. Harmonisation of protein and COVID summary statistics Request a detailed protocol Prior to analyses, we performed a liftover of datasets that reported genomic coordinates using the GRCh37 assembly to GRCh38. We also checked and ensured that the effect allele in a GWAS locus is the alternative allele in the forward strand of the reference genome. To infer strand for palindromic variants (variants with A/T or G/C alleles, i.e. variants with the same pair of letters on the forward strand as on the reverse strand), we first checked the orientation of all non-palindromic variants with respect to the reference genome to assess whether there was a strand consensus of 99% or more. For example, for a given GWAS, if ≥99% of the non-palindromic variants were on the forward strand, we assumed that the palindromic variant would also be on the forward strand; otherwise, they were excluded from analyses. Details of the harmonisation workflow are provided in our GitHub pages (EBISPOT, 2020; Opentargets Inc, 2021). Mendelian randomisation Request a detailed protocol To construct genetic instruments for MR analysis, we selected near-independent (r² = 0.05) genetic variants from across the genome ('pan'-instruments) or from within ±1 Mbp from the transcription start site (TSS) of the gene encoding the protein ('cis'-instruments) associated with the encoded protein abundance at p≤5 × 10–8 for pan-MR analyses and at a less stringent p ≤ 1 × 10⁻⁵ for cis-MR analyses (this p-value corrects for the number of proteins in the druggable genome Schmidt, 2020). We used the generalised summary data-based Mendelian randomisation (GSMR) approach with the heterogeneity-independent instrument (HEIDI)-outlier flag turned on to carry out the pan- and cis-MR analyses (Zhu et al., 2018). The GSMR software, using the HEIDI-outlier method, removes potentially pleiotropic instruments and accounts for the residual correlation between instruments (important as we are using near-independent genetic instruments). To select near-independent genetic instruments and account for linkage disequilibrium (LD) in the MR analyses, we used genotype data from 10,000 randomly sampled UK Biobank participants to create a reference LD matrix, which is ancestry-matched to the pQTL data we used. For each COVID-19 outcome, we used the Benjamini–Hochberg FDR (False Discovery Rate) threshold of 5% for significance, adjusting for 2042 tests in cis-MR analyses and 1286 tests in pan-MR analyses. For trans-acting instruments in pan-MR associations, variants were mapped to their respective cis-gene that had the highest overall V2G score in the Open Targets Genetics portal (Ghoussaini, 2021; Mountjoy, 2020; Open Targets Genetics, 2019a). Colocalisation analysis and phenome-wide association study Request a detailed protocol To identify shared causal genetic signals between protein and COVID outcomes, we used the Bayesian method of genetic colocalisation implemented in the coloc R package (Giambartolomei, 2014) using the marginal association statistics for each trait (i.e. assuming one independent signal in each region). We used beta and standard errors of cis-pQTLs of phenotype pairs as inputs. The default priors in coloc were used, that is, the prior of an SNP (single nucleotide polymorphism)-trait association is 1 × 10–4, and the prior of an SNP associating with both traits is 1 × 10–5. For each COVID-19 outcome, a posterior probability for shared causal genetic signal (PP.H4) threshold of more than 0.8 was used to identify shared causal genetic variants. For colocalising signals, we carried out a phenome-wide association study (PheWAS) using GWAS summary statistics (n = ~ 3000 GWAS) from the Open Targets Genetics portal (Ghoussaini, 2021; Mountjoy, 2020). Evidence against aptamer binding artefacts Request a detailed protocol For variants associated with proteins due to aptamer or epitope binding artefacts (which tend to be missense variants) (Joshi and Mayr, 2018), we first assessed whether genetic instruments for MR or coloc-based single-SNP MR analysis were associated with corresponding gene expression (i.e. whether they were also cis-eQTLs). This used gene expression data from the Open Targets Genetics portal (Ghoussaini, 2021). SNPs that were not cis-eQTLs were investigated further by identifying whether they were (or were in LD at r2 = 0.8 with) missense variants. To query if variants were missense or in LD with missense variants, we used the functional consequence data from Open Targets Genetics (Ghoussaini, 2021) (which used gnomAD v2 for variant effect prediction annotation, Lek, 2016). The reasoning was, if missense variants also had effects on corresponding gene expression, the causal inference using the missense variants as genetic instruments was unlikely to be biased even if the effect estimates were invalid. Where cis-pQTLs were not cis-eQTLs and were missense variants (or in LD with missense variants at r2 = 0.8) affecting the respective genes, these proteins were flagged and excluded from any further downstream analyses on the basis that the missense variant(s) might influence aptamer binding and produce biased effect estimates. Where cis-pQTLs were also cis-eQTLs and were missense variants (or in LD with missense variants) for the respective genes, although the effect estimates would not be valid, the causal inference using the instruments is unlikely to be biased; hence, these variants were retained in supplementary files and estimates of probes represented by these variants were flagged (using an asterisk) in the main figures. The rest, where cis-pQTLs had an effect on gene expression but were not missense variants or in LD with missense variants, were included in all analyses and presented without restrictions. Recombinant protein production Request a detailed protocol Recombinant human receptors and SARS-CoV-2 spike protein extracellular domains were expressed and purified as previously described (Shilts et al., 2021). Briefly, the full extracellular domain sequences of each were expressed as soluble secreted proteins in HEK293 cells. All proteins were affinity-purified using their hexahistidine tags. For biotinylated proteins, co-transfection of secreted BirA ligase in the presence of 100 µM D-biotin resulted in the covalent addition of a biotin group to an acceptor peptide tag, also as described previously (Kerr and Wright, 2012). The extracellular domain of CD209 (Q9NNX6) was defined as beginning at Pro114, while the full cDNA sequence was acquired from OriGene (#SC304915). Plate-based protein binding assay Request a detailed protocol The binding of biotinylated human receptor extracellular domains to pentameric SARS-CoV-2 spike protein was measured using the avidity-based extracellular interaction assay (AVEXIS) as previously described (Bushell et al., 2008). Briefly, the wells of a streptavidin-coated 96-well plate were saturated with biotinylated bait of either CD209, ACE2, or a previously described negative-control construct consisting only of the C-terminal protein tags shared by all other recombinant proteins (rat Cd4(d3 +4)-linker-Bio-6xHis) (Voulgaraki, 2005; Galaway and Wright, 2020). these we applied a of the full SARS-CoV-2 spike protein extracellular domain by a peptide sequence from the protein with a beta binding was measured by of a was by light at receptor binding assay Request a detailed protocol HEK293 cells were as described previously with expression encoding cDNA of CD209 or a the expression recombinant biotinylated spike protein was to as previously described et al., 2018). were with of spike or a control construct of protein tags on a flow as previously described (Shilts et al., 2021). HEK293 cell were provided by Research were first by All cell were for by and found to be all These cell are not by as availability Request a detailed protocol used to summary statistics are provided in (EBISPOT, 2020). for pan- and cis-MR analyses are provided on the GSMR for genetic colocalisation analyses are provided on the coloc GitHub 2021). All used in the to are provided in at 2021). Results and cis-MR analyses the of circulating ABO protein concentrations and soluble IL-6R in COVID-19 risk Our MR analysis used both genetic variants from across the genome and genetic variants near or in the gene encoding the relevant protein to associations of genetically predicted plasma protein concentrations with the risk of COVID-19 The COVID-19 are provided in Supplementary file 1. Although the pan-MR analysis leveraged genetic data from both and trans-acting a of from across the genome by HEIDI-outlier for some protein-COVID-19 pairs that were associated at 5% the associations with COVID-19 outcomes were driven by trans-acting or cis-acting genetic instruments. For example, although proteins were represented by both and trans-acting genetic and were represented only by cis-acting variants and one was driven entirely by trans-acting instruments ABO file the pan-MR analysis revealed protein probes associated with COVID outcomes at an FDR of 5% The selected by GSMR to represent these probes were also cis-eQTLs curated for the Open Targets Genetics portal Open Targets Genetics, and, the ABO signal be to as a that a nucleotide in the of were not missense variants or in LD with missense variants file the that SNPs with associations with proteins were used as genetic instruments for the of significant pan-MR 1 Open the of randomisation and cis-MR and genetic pan- and cis-MR methods used (Sun et al., 2018) as the of genetic instruments and the UK Biobank individual genotype data as reference We selected near-independent genetic instruments and performed MR analysis using generalised summary data-based Mendelian randomisation that for residual correlation between instruments. Genetic colocalisation analysis was used to posterior of shared causal genetic signal between protein and posterior probability of shared causal genetic signal of more than (i.e. a or posterior probability for 4 was used as evidence of genetic The line analysis the from target the three proteins with pan-MR evidence of association with COVID also had cis-MR evidence at cis-MR While the pan-MR analysis used genetic data from across the the cis-MR analysis genetic instrument to near 1 of or in the gene encoding the protein. proteins with pan-MR associations were supported by corresponding cis-MR associations and Supplementary file ABO, and these only ABO and IL-6R proteins had some evidence of genetic colocalisation with posterior (PP.H4) more than and of a shared genetic signal between protein and COVID-19 phenotype Although the of IL-6R was it had a = a signal of the IL-6R protein with the COVID-19 is a more likely than the association driven by independent Open associations of genetically predicted plasma protein concentrations with selected COVID-19 phenotypes. The estimates represent of COVID-19 per standard deviation (SD) of genetically predicted protein abundance using genetic instruments from across the genome randomisation The estimates represent of COVID per of genetically predicted protein abundance using genetic instruments near or in the gene encoding the protein represent The of the are to the of the of the For each COVID pan-MR associations at FDR 5% were a COVID phenotype a pQTL and the number of in the COVID phenotype the number of SNPs used as genetic instruments for the protein the posterior probability that protein and COVID traits the posterior probability evidence for vs. against shared causal variants and the candidate colocalising signal proteins that have that are either missense variants or in linkage disequilibrium with missense variants, their effect estimates potentially predicted ABO was associated with risk in out of seven COVID-19 outcomes These outcomes represented both susceptibility (e.g. COVID-19 vs. cis-MR per genetically predicted ABO × and severity (e.g. hospitalised COVID-19 vs. cis-MR p=1 × of COVID-19. predicted soluble IL-6R was only associated with higher risk of hospitalised COVID-19 compared to per genetically predicted × the SNPs involved in the pan-MR associations of the all probes IL-6R and ABO had at one trans-acting and in all these at one of the trans-acting SNPs were to the ABO gene by the Open Targets Genetics V2G the of the ABO genetic Furthermore, when the of pan-MR associations of these probes across all seven COVID-19 outcomes, the protein probes that have trans-acting ABO SNPs a similar association as the ABO protein associated with only COVID-19 outcomes that have file 1 of proteins reported in our study and the of evidence their by by by and single-SNP of cis-acting of trans-acting

  • Research Article
  • 10.1158/1538-7445.am2015-2764
Abstract 2764: Cosegregating variants in chronic lymphocytic leukemia (CLL) families that are located in loci discovered by genome wide association studies (GWAS)
  • Aug 1, 2015
  • Cancer Research
  • Sara Beiggi + 10 more

Introduction Chronic lymphocytic leukemia (CLL) is a B-cell malignancy that is known to have a familial component to disease risk. Although 31 loci have been found to be associated with CLL risk, the functional variant(s) driving these associations is mostly unknown. Here we set out to identify rare, highly-penetrant, cosegregating, susceptibility variants within the known GWAS discovered loci using whole exome sequencing (WES) data in CLL families from the Mayo Clinic family study of B-cell malignancies. Methods We performed WES on germ line DNA of 93 CLL families with two or more members with CLL, using Agilent capture kits and Illumina HiSeq2000. Bioinformatics analyses leveraged the following software packages: Novoalign, Picard, The Genome Analysis Toolkit (GATK), and the Biological Reference Repository (bioR). Quality control filters were implemented; subjects with mis-specified relationships were removed, as were variants with <75% call rate, <8X coverage, and those identified as sequencing artifacts. Each GWAS locus was defined by +/- 1Mb of the top GWAS hit within the locus. Linkage disequilibrium (LD) was calculated among the single nucleotide variants (SNVs) located within each locus. Potentially functional SNVs were identified based on: a) uncommon in public databases (< 5%), b) cosegregating in at least two CLL families, c) being highly conserved and in coding regions, and d) functional prediction status of deleterious (SIFT Score), damaging (PolyPhen Score), and a moderate, or high variant impact (SNP Effect). Results In our 93 CLL families, we sequenced 443 individuals: 160 with CLL, 73 with monoclonal B cell lymphocytosis (MBL), and 210 relatives that were not diagnosed with CLL or MBL at the time of sequencing. Median age of CLL diagnosis was 59 years (range 34-87), and 56% were male. Among the MBL individuals and relatives, the median age at recruitment was 55 years (range 18-93), and 40% were male. A total of 317,666 SNVs passed our sequencing quality control filters of which 10,731 were within +/- 1 Mb of known GWAS hits from 31 loci. Of these SNVs, 91% were in coding regions, 18% were reported to have high or moderate impact, 6% were estimated to be damaging and 6% were predicted to be deleterious. From these SNVs, we identified 76 putatively functional variants distributed across 25 GWAS loci that were cosegregating in the individuals with CLL or MBL in multiple CLL pedigrees. These SNVs were all located in coding regions with high or moderate impact and were predicted to be damaging and deleterious. Of these 76 variants, 56 had a frequency of <0.005 in 1000 Genomes’ European population while the remaining 20 had a frequency of 1%. Conclusions Through WES, we identified a number of rare, penetrant and potentially predisposing SNVs located within 25 of the 31 CLL GWAS-discovered loci. These segregating variants provide a list for future validation and functional studies. Citation Format: Sara Beiggi, Daniel R. O'Brien, Sara J. Achenbach, Kari G. Chaffee, Timothy G. Call, Neil E. Kay, Tait D. Shanafelt, Julie Cunningham, James R. Cerhan, Celine M. Vachon, Susan L. Slager. Cosegregating variants in chronic lymphocytic leukemia (CLL) families that are located in loci discovered by genome wide association studies (GWAS). [abstract]. In: Proceedings of the 106th Annual Meeting of the American Association for Cancer Research; 2015 Apr 18-22; Philadelphia, PA. Philadelphia (PA): AACR; Cancer Res 2015;75(15 Suppl):Abstract nr 2764. doi:10.1158/1538-7445.AM2015-2764

  • Research Article
  • Cite Count Icon 7
  • 10.1161/circulationaha.121.055857
Identification of DNA Damage Repair Enzyme Ascc2 as Causal for Heart Failure With Preserved Ejection Fraction.
  • Apr 5, 2022
  • Circulation
  • Yang Cao + 10 more

Identification of DNA Damage Repair Enzyme Ascc2 as Causal for Heart Failure With Preserved Ejection Fraction.

  • Front Matter
  • Cite Count Icon 2
  • 10.4065/mcp.2011.0337
Genome-Wide Association Studies Go Green: Novel and Cost-Effective Opportunities for Identifying Genetic Associations
  • Jul 1, 2011
  • Mayo Clinic Proceedings
  • Celine M Vachon

Genome-Wide Association Studies Go Green: Novel and Cost-Effective Opportunities for Identifying Genetic Associations

  • Supplementary Content
  • 10.1016/j.ajhg.2017.05.008
This Month in The Journal
  • Jun 1, 2017
  • The American Journal of Human Genetics
  • Sarah Ratzel + 1 more

This Month in The Journal

  • Discussion
  • Cite Count Icon 19
  • 10.1016/s1474-4422(22)00395-7
Diabetes and Alzheimer's disease: shared genetic susceptibility?
  • Oct 18, 2022
  • The Lancet Neurology
  • John Hardy + 2 more

Diabetes and Alzheimer's disease: shared genetic susceptibility?

  • Front Matter
  • Cite Count Icon 11
  • 10.1016/j.jaci.2009.12.976
Genetics and biology of asthma 2010: La' ci darem la mano…
  • Feb 1, 2010
  • Journal of Allergy and Clinical Immunology
  • Donata Vercelli

Genetics and biology of asthma 2010: La' ci darem la mano…

  • Research Article
  • Cite Count Icon 5
  • 10.1111/pcmr.12203
Selection, p53, and pigmentation
  • Dec 23, 2013
  • Pigment Cell & Melanoma Research
  • Margret H Ogmundsdottir + 1 more

Selection, p53, and pigmentation

  • Book Chapter
  • 10.1007/978-3-319-32199-8_13
Network Analysis and Fine-Mapping GWAS Loci to Identify Genes and Functional Variants Involved in the Development of Dupuytren Disease
  • Sep 7, 2016
  • Kerstin Becker + 3 more

The first genome-wide association study (GWAS) in Dupuytren disease (DD) has successfully identified nine genomic regions that harbor genetic variants contributing to the genetics of this disease. In GWASs common single nucleotide variants (SNVs) are investigated for association with a given trait or disease. These common SNVs are rarely the direct causative variants, but instead by chance, they capture the real causative variants in linkage disequilibrium (LD) at a given association locus. One of the major challenges in complex genetic diseases is the identification of these causal genetic variants that underlie the association signals. Integrative approaches that target the functional relevance of these causal variants on multiple levels are needed. The success of these approaches depends on the individual genetic architecture at each GWAS locus, which can be very complex, and the specific features of a given trait or disease. Targeted sequencing of the GWAS loci is one possible approach. Extensive analysis of GWAS data in conjunction with other possible data sources, e.g., expression data, is another approach to help to unravel the genetics of Dupuytren disease. We have been applying both approaches aiming to interrogate candidate gene variants and to characterize pathways of DD.

  • Abstract
  • 10.1016/j.jalz.2019.06.044
HIGH-RESOLUTION GENOMEWIDE PROMOTER-FOCUSED CONNECTOME IMPLICATES MICROGLIA CAUSAL GENES FOR ALZHEIMER’S DISEASE
  • Jul 1, 2019
  • Alzheimer's & Dementia
  • Mariana Argenziano + 12 more

HIGH-RESOLUTION GENOMEWIDE PROMOTER-FOCUSED CONNECTOME IMPLICATES MICROGLIA CAUSAL GENES FOR ALZHEIMER’S DISEASE

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 27
  • 10.1371/journal.pone.0119420
In-silico analysis of inflammatory bowel disease (IBD) GWAS loci to novel connections.
  • Mar 18, 2015
  • PLOS ONE
  • Md Mesbah-Uddin + 4 more

Genome-wide association studies (GWASs) for many complex diseases, including inflammatory bowel disease (IBD), produced hundreds of disease-associated loci—the majority of which are noncoding. The number of GWAS loci is increasing very rapidly, but the process of translating single nucleotide polymorphisms (SNPs) from these loci to genomic medicine is lagging. In this study, we investigated 4,734 variants from 152 IBD associated GWAS loci (IBD associated 152 lead noncoding SNPs identified from pooled GWAS results + 4,582 variants in strong linkage-disequilibrium (LD) (r2 ≥0.8) for EUR population of 1K Genomes Project) using four publicly available bioinformatics tools, e.g. dbPSHP, CADD, GWAVA, and RegulomeDB, to annotate and prioritize putative regulatory variants. Of the 152 lead noncoding SNPs, around 11% are under strong negative selection (GERP++ RS ≥2); and ~30% are under balancing selection (Tajima’s D score >2) in CEU population (1K Genomes Project)—though these regions are positively selected (GERP++ RS <0) in mammalian evolution. The analysis of 4,734 variants using three integrative annotation tools produced 929 putative functional SNPs, of which 18 SNPs (from 15 GWAS loci) are in concordance with all three classifiers. These prioritized noncoding SNPs may contribute to IBD pathogenesis by dysregulating the expression of nearby genes. This study showed the usefulness of integrative annotation for prioritizing fewer functional variants from a large number of GWAS markers.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 115
  • 10.1371/journal.pone.0148717
Alzheimer’s Disease Risk Polymorphisms Regulate Gene Expression in the ZCWPW1 and the CELF1 Loci
  • Feb 26, 2016
  • PLoS ONE
  • Celeste M Karch + 3 more

Late onset Alzheimer’s disease (LOAD) is a genetically complex and clinically heterogeneous disease. Recent large-scale genome wide association studies (GWAS) have identified more than twenty loci that modify risk for AD. Despite the identification of these loci, little progress has been made in identifying the functional variants that explain the association with AD risk. Thus, we sought to determine whether the novel LOAD GWAS single nucleotide polymorphisms (SNPs) alter expression of LOAD GWAS genes and whether expression of these genes is altered in AD brains. The majority of LOAD GWAS SNPs occur in gene dense regions under large linkage disequilibrium (LD) blocks, making it unclear which gene(s) are modified by the SNP. Thus, we tested for brain expression quantitative trait loci (eQTLs) between LOAD GWAS SNPs and SNPs in high LD with the LOAD GWAS SNPs in all of the genes within the GWAS loci. We found a significant eQTL between rs1476679 and PILRB and GATS, which occurs within the ZCWPW1 locus. PILRB and GATS expression levels, within the ZCWPW1 locus, were also associated with AD status. Rs7120548 was associated with MTCH2 expression, which occurs within the CELF1 locus. Additionally, expression of several genes within the CELF1 locus, including MTCH2, were highly correlated with one another and were associated with AD status. We further demonstrate that PILRB, as well as other genes within the GWAS loci, are most highly expressed in microglia. These findings together with the function of PILRB as a DAP12 receptor supports the critical role of microglia and neuroinflammation in AD risk.

  • Front Matter
  • Cite Count Icon 10
  • 10.1053/j.gastro.2014.02.023
IBD Genetics: Focus on (Dys) Regulation in Immune Cells and the Epithelium
  • Feb 22, 2014
  • Gastroenterology
  • Arthur Kaser + 1 more

IBD Genetics: Focus on (Dys) Regulation in Immune Cells and the Epithelium

  • Discussion
  • Cite Count Icon 12
  • 10.1161/atvbaha.122.317539
Causal Gene Confusion: The Complicated EDN1/PHACTR1 Locus for Coronary Artery Disease.
  • Apr 7, 2022
  • Arteriosclerosis, thrombosis, and vascular biology
  • Rajat M Gupta

Causal Gene Confusion: The Complicated EDN1/PHACTR1 Locus for Coronary Artery Disease.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.