DisSNPNet: Predicting disease-associated single-nucleotide polymorphisms using linkage disequilibrium, disease similarity, and 1000 Genomes Project datasets with evidence-based validation

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Identifying disease-associated single-nucleotide polymorphisms (SNPs) is fundamental to understanding complex disease genetics, yet genome-wide association studies (GWAS) remain costly and data-intensive. Network-based approaches provide a complementary strategy by exploiting linkage disequilibrium (LD) structure- and disease-relatedness to prioritize candidate variants. We present DisSNPNet, a heterogeneous network-based framework that integrates chromosome-specific SNP LD networks derived from 1000 Genomes Project Phase 1 and Phase 3 data, a MeSH-based disease similarity network, and known disease–SNP associations from CAUSALdb. Random walk with restart was applied to rank SNPs for each disease. Predictive performance was evaluated using disease-wise 3-fold cross-validation with AUROC and AUPR. Biological plausibility was assessed by querying top-ranked SNPs in GWAS resources and by disease-specific KEGG pathway enrichment. A chromosome-matched random baseline was constructed to contextualize external GWAS evidence. DisSNPNet consistently outperformed SNP-only LD networks, with heterogeneous networks yielding higher AUROC and AUPR across chromosomes. Strong LD networks (r2 ≥ 0.8) improved precision, particularly in imbalanced settings. Top-ranked SNPs showed significantly greater GWAS evidence than random expectation across all chromosomes, indicating nonrandom enrichment. Disease-specific pathway enrichment revealed biologically coherent mechanisms across immune, metabolic, cardiovascular, and structural diseases. DisSNPNet provides a robust and interpretable framework for prioritizing disease-associated SNPs. While not a substitute for GWAS, it offers a scalable, evidence-supported approach for SNP prioritization and hypothesis generation, complementing experimental and population-based studies.

Similar Papers
  • Abstract
  • 10.1016/j.jalz.2019.06.4368
HIGH-THROUGHPUT IDENTIFICATION OF NONCODING FUNCTIONAL SNPS
  • Jul 1, 2019
  • Alzheimer's & Dementia
  • Gang Li

HIGH-THROUGHPUT IDENTIFICATION OF NONCODING FUNCTIONAL SNPS

  • Research Article
  • Cite Count Icon 360
  • 10.1016/s0140-6736(08)60208-1
LDL-cholesterol concentrations: a genome-wide association study
  • Feb 1, 2008
  • Lancet (London, England)
  • Manjinder S Sandhu + 29 more

LDL-cholesterol concentrations: a genome-wide association study

  • Research Article
  • Cite Count Icon 64
  • 10.1161/circulationaha.109.914192
Coronary Heart Disease Risk Prediction in the Era of Genome-Wide Association Studies
  • May 24, 2010
  • Circulation
  • Steve E Humphries + 3 more

For DNA-based tests that assess genetic predisposition to coronary heart disease (CHD) to be of clinical value, they need to provide information over and above conventional risk factors (CRFs) currently used in risk algorithms, such as the Framingham Risk Score,1 which incorporates CRFs such as age, gender, blood lipid concentrations, blood pressure, body mass index, family history, and smoking habit. To achieve this, several hurdles must be passed. The first challenge is to identify a set of common single-nucleotide polymorphisms (SNPs) at loci associated with CHD risk. Over the last 10 to 15 years, this has been done by use of a “candidate gene” approach through association studies in prospective analysis or case-control studies, ie, comparing SNP genotype or allele frequency between groups of individuals with CHD and healthy subjects. Several of the genes, chosen because of their key role in processes that predispose to atherosclerosis, have meta-analysis–confirmed effects on risk of CHD,2 the best example of which is the APOE gene, which encodes apolipoprotein E, with 3 common isoforms that are associated with strong effects on plasma lipids and more modest effects on risk of CHD.3 This “hypothesis-driven” search for useful genetic variants provides the foundation for the development of genetic CHD risk profiles, and in the last 2 years, it has been enhanced by technical advances that have allowed “hypothesis-free” genome-wide association studies (GWASs), primarily in a case-control setting. Although the list of identified CHD-risk loci and SNPs will clearly grow, we have at least the basis to start the examination of their potential clinical utility. The second set of challenges is to obtain a robust estimate of the size of the risk effects associated with these SNPs. This requires population-based prospective studies to avoid bias, because estimates in the case-control setting, although efficient for …

  • Research Article
  • Cite Count Icon 5
  • 10.1016/j.heliyon.2024.e32053
Structural variants in linkage disequilibrium with GWAS-significant SNPs
  • May 28, 2024
  • Heliyon
  • Hao Liang + 3 more

With the recent expansion of structural variant identification in the human genome, understanding the role of these impactful variants in disease architecture is critically important. Currently, a large proportion of genome-wide-significant genome-wide association study (GWAS) single nucleotide polymorphisms (SNPs) are functionally unresolved, raising the possibility that some of these SNPs are associated with disease through linkage disequilibrium with causal structural variants. Hence, understanding the linkage disequilibrium between newly discovered structural variants and statistically significant SNPs may provide a resource for further investigation into disease-associated regions in the genome. Here we present a resource cataloging structural variant-significant SNP pairs in high linkage disequilibrium. The database is composed of (i) SNPs that have exhibited genome-wide significant association with traits, primarily disease phenotypes, (ii) newly released structural variants (SVs), and (iii) linkage disequilibrium values calculated from unphased data. All data files including those detailing SV and GWAS SNP associations and results of GWAS-SNP-SV pairs are available at the SV-SNP LD Database and can be accessed at ‵https://github.com/hliang-SchrodiLab/SV_SNPs. Our analysis results represent a useful fine mapping tool for interrogating SVs in linkage disequilibrium with disease-associated SNPs. We anticipate that this resource may play an important role in subsequent studies which investigate incorporating disease causing SVs into disease risk prediction models.

  • Front Matter
  • Cite Count Icon 12
  • 10.1016/j.jaci.2009.12.976
Genetics and biology of asthma 2010: La' ci darem la mano…
  • Feb 1, 2010
  • Journal of Allergy and Clinical Immunology
  • Donata Vercelli

Genetics and biology of asthma 2010: La' ci darem la mano…

  • Research Article
  • Cite Count Icon 39
  • 10.1016/j.ajhg.2021.02.006
Integrative analysis of liver-specific non-coding regulatory SNPs associated with the risk of coronary artery disease
  • Feb 23, 2021
  • The American Journal of Human Genetics
  • Ilakya Selvarajan + 21 more

Integrative analysis of liver-specific non-coding regulatory SNPs associated with the risk of coronary artery disease

  • Research Article
  • Cite Count Icon 47
  • 10.1261/rna.029900.111
Structural effects of linkage disequilibrium on the transcriptome
  • Nov 22, 2011
  • RNA
  • Joshua S Martin + 6 more

A majority of SNPs (single nucleotide polymorphisms) map to noncoding and intergenic regions of the genome. Noncoding SNPs are often identified in genome-wide association studies (GWAS) as strongly associated with human disease. Two such disease-associated SNPs in the 5' UTR of the human FTL (Ferritin Light Chain) gene are predicted to alter the ensemble of structures adopted by the mRNA. High-accuracy single nucleotide resolution chemical mapping reveals that these SNPs result in substantial changes in the structural ensemble in agreement with the computational prediction. Furthermore six rescue mutations are correctly predicted to restore the mRNA to its wild-type ensemble. Our data confirm that the FTL 5' UTR is a "RiboSNitch," an RNA that changes structure if a particular disease-associated SNP is present. The structural change observed is analogous to that of a bacterial Riboswitch in that it likely regulates translation. These data further suggest that specific pairs of SNPs in high linkage disequilibrium (LD) will form RNA structure-stabilizing haplotypes (SSHs). We identified 484 SNP pairs that form SSHs in UTRs of the human genome, and in eight of the 10 SSH-containing transcripts, SNP pairs stabilize RNA protein binding sites. The ubiquitous nature of SSHs in the transcriptome suggests that certain haplotypes are conserved to avoid RiboSNitch formation.

  • Research Article
  • Cite Count Icon 343
  • 10.1086/316944
Extent and Distribution of Linkage Disequilibrium in Three Genomic Regions
  • Jan 1, 2001
  • The American Journal of Human Genetics
  • Gonçalo R Abecasis + 12 more

Extent and Distribution of Linkage Disequilibrium in Three Genomic Regions

  • Research Article
  • Cite Count Icon 74
  • 10.1172/jci26488
A machine to make a future Biotech chronicles
  • Sep 1, 2005
  • Journal of Clinical Investigation
  • Xuefeng Bruce Ling

A machine to make a future Biotech chronicles

  • Research Article
  • Cite Count Icon 58
  • 10.1158/1055-9965.681.13.5
SNPs, Haplotypes, and Cancer: Applications in Molecular Epidemiology
  • May 1, 2004
  • Cancer Epidemiology, Biomarkers & Prevention
  • Timothy R Rebbeck + 6 more

SNPs, Haplotypes, and Cancer: Applications in Molecular Epidemiology

  • Research Article
  • 10.1158/1538-7445.am2013-2552
Abstract 2552: A genome-wide association study of prostate cancer in West African men.
  • Apr 15, 2013
  • Cancer Research
  • Michael B Cook + 15 more

Age-adjusted mortality rates for prostate cancer are higher for Africa compared with North America or Western Europe. In addition, African American men are noted to have higher age-adjusted incidence rates of this malignancy than European American men. Coupled with the fact that West Africa is the principal ancestral region of African-American men has led to the hypothesis that there may exist distinct ancestral genetic profiles which mediate prostate cancer risk. In addition, advantages of conducting a genome-wide association study (GWAS) of prostate cancer in African men include a more discrete linkage disequilibrium (LD) structure, a higher number of private single nucleotide polymorphisms (SNPs), the predominance of symptomatic disease, and assessment of unique exposures. The Ghana Prostate Study was conducted collaboratively involving the US National Cancer Institute (NCI) and the University of Ghana during 2006-2012. The NCI Cancer Genomics Research Laboratory genotyped 494 prostate cancer cases and 498 population controls using the Illumina HumanOmni5-Quad BeadChip. Associations were assessed using multivariate logistic regression adjusted for age and genetic ancestry. We sought to validate the 30 most promising SNP associations with prostate cancer through the African American Prostate Cancer GWAS Consortium. A novel locus at 10p14 for prostate cancer risk was the strongest signal detected, and the 8 SNPs at this locus were in LD. This locus is located 360 kb 5’ of GATA3 and the 8 SNPs reside within an intron of LincRNA gene RP11-543F8.2. Analysis of African 1000 Genomes Project data did not indicate LD between 10p14 SNPs and splice or exonic SNPs of this gene, while HaploReg found no significant enrichment of enhancer elements. None of the most promising 30 SNPs replicated in the African American Prostate Cancer GWAS Consortium. This may be due to chance or differences in population genetics, environment, and/or proportion of symptomatic disease. Further genetic studies of prostate cancer in African men are needed to validate the 10p14 susceptibility locus. Citation Format: Michael B. Cook, Zhaoming Wang, Edward D. Yeboah, Andrew A. Adjei, Yao Tettey, Richard B. Biritwum, Evelyn Tay, Ann Truelove, Shelley Niwa, Lisa Chu, Meredith Yeager, Amy Hutchinson, Kai Yu, Christopher A. Haiman, African American Prostate Cancer GWAS Consortium, Robert N. Hoover, Ann Hsing, Stephen J. Chanock. A genome-wide association study of prostate cancer in West African men. [abstract]. In: Proceedings of the 104th Annual Meeting of the American Association for Cancer Research; 2013 Apr 6-10; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2013;73(8 Suppl):Abstract nr 2552. doi:10.1158/1538-7445.AM2013-2552

  • Research Article
  • Cite Count Icon 20
  • 10.1016/j.bbmt.2008.11.020
Exploration of the Genetic Basis of GVHD by Genetic Association Studies
  • Jan 1, 2009
  • Biology of Blood and Marrow Transplantation
  • Seishi Ogawa + 16 more

Exploration of the Genetic Basis of GVHD by Genetic Association Studies

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 61
  • 10.1186/1471-2105-15-152
LincSNP: a database of linking disease-associated SNPs to human large intergenic non-coding RNAs
  • May 20, 2014
  • BMC Bioinformatics
  • Shangwei Ning + 7 more

BackgroundGenome-wide association studies (GWAS) have successfully identified a large number of single nucleotide polymorphisms (SNPs) that are associated with a wide range of human diseases. However, many of these disease-associated SNPs are located in non-coding regions and have remained largely unexplained. Recent findings indicate that disease-associated SNPs in human large intergenic non-coding RNA (lincRNA) may lead to susceptibility to diseases through their effects on lincRNA expression. There is, therefore, a need to specifically record these SNPs and annotate them as potential candidates for disease.DescriptionWe have built LincSNP, an integrated database, to identify and annotate disease-associated SNPs in human lincRNAs. The current release of LincSNP contains approximately 140,000 disease-associated SNPs (or linkage disequilibrium SNPs), which can be mapped to around 5,000 human lincRNAs, together with their comprehensive functional annotations. The database also contains annotated, experimentally supported SNP-lincRNA-disease associations and disease-associated lincRNAs. It provides flexible search options for data extraction and searches can be performed by disease/phenotype name, SNP ID, lincRNA name and chromosome region. In addition, we provide users with a link to download all the data from LincSNP and have developed a web interface for the submission of novel identified SNP-lincRNA-disease associations.ConclusionsThe LincSNP database aims to integrate disease-associated SNPs and human lincRNAs, which will be an important resource for the investigation of the functions and mechanisms of lincRNAs in human disease. The database is available at http://bioinfo.hrbmu.edu.cn/LincSNP.

  • Peer Review Report
  • 10.7554/elife.69719.sa1
Decision letter: A proteome-wide genetic investigation identifies several SARS-CoV-2-exploited host targets of clinical relevance
  • Jun 28, 2021
  • John W Schoggins

Article Figures and data Abstract eLife digest Introduction Materials and methods Results Discussion Data availability References Decision letter Author response Article and author information Metrics Abstract Background: The virus SARS-CoV-2 can exploit biological vulnerabilities (e.g. host proteins) in susceptible hosts that predispose to the development of severe COVID-19. Methods: To identify host proteins that may contribute to the risk of severe COVID-19, we undertook proteome-wide genetic colocalisation tests, and polygenic (pan) and cis-Mendelian randomisation analyses leveraging publicly available protein and COVID-19 datasets. Results: Our analytic approach identified several known targets (e.g. ABO, OAS1), but also nominated new proteins such as soluble Fas (colocalisation probability >0.9, p=1 × 10-4), implicating Fas-mediated apoptosis as a potential target for COVID-19 risk. The polygenic (pan) and cis-Mendelian randomisation analyses showed consistent associations of genetically predicted ABO protein with several COVID-19 phenotypes. The ABO signal is highly pleiotropic, and a look-up of proteins associated with the ABO signal revealed that the strongest association was with soluble CD209. We demonstrated experimentally that CD209 directly interacts with the spike protein of SARS-CoV-2, suggesting a mechanism that could explain the ABO association with COVID-19. Conclusions: Our work provides a prioritised list of host targets potentially exploited by SARS-CoV-2 and is a precursor for further research on CD209 and FAS as therapeutically tractable targets for COVID-19. Funding: MAK, JSc, JH, AB, DO, MC, EMM, MG, ID were funded by Open Targets. J.Z. and T.R.G were funded by the UK Medical Research Council Integrative Epidemiology Unit (MC_UU_00011/4). JSh and GJW were funded by the Wellcome Trust Grant 206194. This research was funded in part by the Wellcome Trust [Grant 206194]. For the purpose of open access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. eLife digest Individuals who become infected with the virus that causes COVID-19 can experience a wide variety of symptoms. These can range from no symptoms or minor symptoms to severe illness and death. Key demographic factors, such as age, gender and race, are known to affect how susceptible an individual is to infection. However, molecular factors, such as unique gene mutations and gene expression levels can also have a major impact on patient responses by affecting the levels of proteins in the body. Proteins that are too abundant or too scarce may mean the difference between dying from or surviving COVID-19. Identifying the molecular factors in a host that affect how viruses can infect individuals, evade immune defences or trigger severe illness, could provide new ways to treat patients with COVID-19. Such factors are likely to remain constant, even when the virus mutates into new strains. Hence, insights would likely apply across all virus strains, including current strains, such as alpha and delta, and any new strains that may emerge in the future. Using such a 'natural experiment' approach, Karim et al. compared the genetic profiles of over 30,000 COVID-19 patients and a million healthy individuals. Nine proteins were found to have an impact on COVID-19 infection and disease severity. Four proteins were ranked as top priorities for potential treatment targets. One protein, called CD209 (also known as DC-SIGN), is involved in how the virus enters the host cells, and had one of the strongest associations with COVID-19. Two proteins, called IL-6R and FAS, were involved in the immune response and could be responsible for the immune over-activation often seen in severe COVID-19. Finally, one protein, called OAS1, formed part of the body's innate antiviral defence system and appeared to reduce susceptibility to COVID-19. Knowing more about the proteins that influence the severity of COVID-19 opens up new ways to predict, protect and treat patients who may have severe or fatal reactions to infection. Indeed, one of the identified proteins (IL-6R) had already been targeted in recent clinical trials with some encouraging results. Considering CD209 as a potential receptor for the virus could provide another avenue for therapeutics, similar to previously successful approaches to block the virus' known interaction with a receptor protein. Ultimately, this research could supply an entirely new set of treatment options to help combat the COVID-19 pandemic. Introduction At the current time, the coronavirus disease 2019 (COVID-19) pandemic is implicated in the deaths of more than 4 million people worldwide (Dong et al., 2020). Although effective vaccines have been developed to substantially reduce mortality and morbidity due to severe COVID-19, the emergence of mutated strains of the SARS-CoV-2 virus has challenged the effectiveness of existing vaccines and raised the urgency of identifying alternate therapeutic pathways to target the virus (Tegally, 2020; Erik et al., 2020 ; Collier et al., 2021). Nevertheless, it is likely that the mutated strains of SARS-CoV-2 will continue to exploit the same vulnerable host biology to bind onto and infect cells and, in susceptible individuals, evade immune defences and promote the excessive host inflammatory response that is characteristic of severe COVID-19 (Gordon et al., 2020a). Therefore, the identification of host proteins that play roles in COVID-19 susceptibility and severity remains crucial to the development of therapeutics as host protein mechanisms are independent of genomic mutations in the virus. An improved understanding of these therapeutically relevant virus-host pathways may also be important in combating viruses beyond SARS-CoV-2 (Perrin-Cocon et al., 2020). Several large-scale systematic experimental efforts have identified key host proteins that interact with viral proteins in the pathogenesis of severe COVID-19 (Gordon et al., 2020a; Gordon et al., 2020b; Bouhaddou et al., 2020). These notably include efforts to identify direct interactions with the spike protein of SARS-CoV-2, which mediates virus attachment onto receptors to infect host cells and is also the basis of most vaccines (Shang et al., 2020; Harvey et al., 2021). To complement in vitro host protein characterisation efforts, several groups have leveraged genetic datasets of human proteins and COVID-19 disease to identify therapeutically actionable candidate host proteins that are likely to play roles in enhancing COVID-19 susceptibility or to be involved in the pathogenesis of severe COVID-19 (Pairo-Castineira et al., 2021; Zhou et al., 2021). One of the approaches used was Mendelian randomisation (MR). MR simulates the design of randomised trials, with the underlying principle that randomisation of alleles at conception offers the opportunity to examine approximate differences in average risk of disease between comparable groups in a population that differ only in the distribution of the risk factor of interest (Davies et al., 2018), for example, protein abundance (Zheng et al., 2020). This allows the use of alleles as genetic instruments representing genetically predicted protein levels to proxy effects of pharmacological modulation of the protein. Some of the clinically actionable proteins identified by the MR approach are part of type I interferon signalling (encoded by genes: IFNAR2, TYK2, OAS1) and interleukin-6 (IL-6) signalling pathways (IL6R). Only one of these proteins (encoded by OAS1) had any evidence of genetic colocalisation, that is, evidence that genetic associations of the protein and COVID outcomes shared the same causal genetic signal (Zhou et al., 2021). An additional protein that was supported by both MR and genetic colocalisation tests was ABO (Zhou et al., 2021), reported in several published genome-wide association studies (GWAS) of COVID-19 (Pairo-Castineira et al., 2021; Ellinghaus et al., 2020). In response to the first published GWAS of COVID-19, we reported findings that link the ABO signal with a number of clinically actionable targets including coagulation factors (von Willebrand factor [vWF], and Factor VIII [F8]), IL-6, and CD209/DC-SIGN (Karim et al., 2020). However, in most of the previous MR studies (Pairo-Castineira et al., 2021; Zhou et al., 2021), investigators only used curated cis-acting variants (genetic variants near or in the gene encoding the relevant protein) as genetic instruments to represent effects of genetically predicted protein concentrations, rather than genome-wide instruments. While the use of cis-acting variants can minimise the risk of horizontal pleiotropic effects (i.e. associations driven by other proteins not on the causal pathway for the disease), it can suffer from lower power than a genome-wide analysis due to fewer available instruments (Zheng et al., 2020). Furthermore, in previous protein-COVID-19 MR studies, genetic colocalisation tests were carried out only for protein-phenotype associations that were significant in the MR analysis, potentially excluding many protein-phenotype associations that may share the same causal genetic signal but are underpowered in a proteome-wide MR approach. In the present study, we expanded on these previous reports by undertaking a proteome-wide two-sample pan- and cis-MR analysis using the Sun et al. GWAS (Sun et al., 2018) of plasma protein concentrations and several COVID-19 GWAS phenotypes from the ICDA COVID-19 Host Genetics Initiative (October 2020 release) (Huang et al., 2020). First, we showed that genetically predicted circulating ABO protein was associated with COVID-19 susceptibility and severity and the lead ABO signal was associated strongly with plasma concentrations of soluble CD209. Second, we collected evidence for a direct mechanism of interaction between the SARS-CoV-2 spike protein and human CD209 protein. Third, we performed proteome-wide genetic colocalisation tests, followed by single-instrument cis-MR analysis, and we report additional novel targets of therapeutic relevance. Finally, we examined associated phenotypes using the colocalising signals from the Open Targets Genetics portal (http://genetics.opentargets.org) to shed light on the biological basis of association of the proteins with the COVID-19 phenotypes. Materials and methods Key resources table Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional informationCell line (Homo sapiens)HEK293-EYves Durocher, PMID:11788735RRID:CVCL_6974Transfected construct (Homo sapiens)pCMV6-CD209OrigeneCat.# SC304915Plasmid for CD209 cDNA expression in cell-based binding assayTransfected construct (Homo sapiens)pTT3-ACE2-BLHPMID:33432067Plasmid for recombinant ACE2 extracellular domain, for plate-based assays as the immobilised formTransfected construct (Homo sapiens)pTT3-CD209-BLHThis paperPlasmid for recombinant CD209 extracellular domain for plate-based assays as the immobilised formTransfected construct (Homo sapiens)pTT3-Cd4d3+ d4AddgeneRRID:Addgene_32402Plasmid for recombinant tag control (Cd4 domains 3 and 4)Transfected construct (Homo sapiens)pTT3-SPIKE-COMP-BLacThis paperPlasmid for recombinant SARS-CoV-2 spike extracellular domain for plate-based assays as the soluble formTransfected construct (Homo sapiens)pTT3-BirA-FLAGAddgeneRRID:Addgene_64395Biotin ligase plasmid for recombinant protein biotinylationPeptide, recombinant proteinStreptavidin R-phycoerythrinBioLegendCat.# 405245For tetramer staining in cell-based binding assayChemical compound, drugDAPI (4',6-diamidino-2-phenylindole)BioLegendCat.# 4228011 μM for flow cytometry live/dead stainingChemical compound, drugD-biotinSigma-AldrichCat.# 2031100 μM supplemented to cell culture media for biotinylationSoftware, algorithmR (version 4.0.3)R Foundationwww.r-project.orgRRID:SCR_001905Analysis and generating plots Genetic associations of proteins Request a detailed protocol We primarily used Sun et al. protein GWAS data (Sun et al., 2018; Emilsson et al., 2018) for the pan-/cis-MR analyses and for performing genetic colocalisation tests (described below). The pan-/cis-MR effects were expressed per standard deviation (SD) higher genetically predicted plasma protein concentrations. Two additional proteomic datasets (Emilsson et al., 2018; Suhre et al., 2017) were used to identify proteins associated with the ABO locus. The genotyping protocols and QC of these proteomic studies have been described previously (Sun et al., 2018; Emilsson et al., 2018; Suhre et al., 2017). All three of the proteomic studies have used the SOMAscan assay platform (an aptamer-based protein detection platform) to detect and quantify protein abundance (Gold et al., 2012). Genetic associations of COVID-19 Request a detailed protocol We used seven meta-analysed COVID-19 datasets from the October 2020 release of the ICDA COVID-HGI group (https://www.covid19hg.org/results/r4/). These seven COVID-19 outcomes are A1 (very severe respiratory confirmed COVID vs. not hospitalised COVID), A2 (very severe respiratory confirmed COVID vs. population), B1 (hospitalised COVID vs. not hospitalised COVID), B2 (hospitalised COVID vs. population), C1 (COVID vs. lab/self-reported negative), C2 (COVID vs. population), and D1 (predicted COVID from self-reported symptoms vs. predicted or self-reported non-COVID). Definitions of these outcomes are provided in Supplementary file 1. Harmonisation of protein and COVID summary statistics Request a detailed protocol Prior to analyses, we performed a liftover of datasets that reported genomic coordinates using the GRCh37 assembly to GRCh38. We also checked and ensured that the effect allele in a GWAS locus is the alternative allele in the forward strand of the reference genome. To infer strand for palindromic variants (variants with A/T or G/C alleles, i.e. variants with the same pair of letters on the forward strand as on the reverse strand), we first checked the orientation of all non-palindromic variants with respect to the reference genome to assess whether there was a strand consensus of 99% or more. For example, for a given GWAS, if ≥99% of the non-palindromic variants were on the forward strand, we assumed that the palindromic variant would also be on the forward strand; otherwise, they were excluded from analyses. Details of the harmonisation workflow are provided in our GitHub pages (EBISPOT, 2020; Opentargets Inc, 2021). Mendelian randomisation Request a detailed protocol To construct genetic instruments for MR analysis, we selected near-independent (r² = 0.05) genetic variants from across the genome ('pan'-instruments) or from within ±1 Mbp from the transcription start site (TSS) of the gene encoding the protein ('cis'-instruments) associated with the encoded protein abundance at p≤5 × 10–8 for pan-MR analyses and at a less stringent p ≤ 1 × 10⁻⁵ for cis-MR analyses (this p-value corrects for the number of proteins in the druggable genome Schmidt, 2020). We used the generalised summary data-based Mendelian randomisation (GSMR) approach with the heterogeneity-independent instrument (HEIDI)-outlier flag turned on to carry out the pan- and cis-MR analyses (Zhu et al., 2018). The GSMR software, using the HEIDI-outlier method, removes potentially pleiotropic instruments and accounts for the residual correlation between instruments (important as we are using near-independent genetic instruments). To select near-independent genetic instruments and account for linkage disequilibrium (LD) in the MR analyses, we used genotype data from 10,000 randomly sampled UK Biobank participants to create a reference LD matrix, which is ancestry-matched to the pQTL data we used. For each COVID-19 outcome, we used the Benjamini–Hochberg FDR (False Discovery Rate) threshold of 5% for significance, adjusting for 2042 tests in cis-MR analyses and 1286 tests in pan-MR analyses. For trans-acting instruments in pan-MR associations, variants were mapped to their respective cis-gene that had the highest overall V2G score in the Open Targets Genetics portal (Ghoussaini, 2021; Mountjoy, 2020; Open Targets Genetics, 2019a). Colocalisation analysis and phenome-wide association study Request a detailed protocol To identify shared causal genetic signals between protein and COVID outcomes, we used the Bayesian method of genetic colocalisation implemented in the coloc R package (Giambartolomei, 2014) using the marginal association statistics for each trait (i.e. assuming one independent signal in each region). We used beta and standard errors of cis-pQTLs of phenotype pairs as inputs. The default priors in coloc were used, that is, the prior of an SNP (single nucleotide polymorphism)-trait association is 1 × 10–4, and the prior of an SNP associating with both traits is 1 × 10–5. For each COVID-19 outcome, a posterior probability for shared causal genetic signal (PP.H4) threshold of more than 0.8 was used to identify shared causal genetic variants. For colocalising signals, we carried out a phenome-wide association study (PheWAS) using GWAS summary statistics (n = ~ 3000 GWAS) from the Open Targets Genetics portal (Ghoussaini, 2021; Mountjoy, 2020). Evidence against aptamer binding artefacts Request a detailed protocol For variants associated with proteins due to aptamer or epitope binding artefacts (which tend to be missense variants) (Joshi and Mayr, 2018), we first assessed whether genetic instruments for MR or coloc-based single-SNP MR analysis were associated with corresponding gene expression (i.e. whether they were also cis-eQTLs). This used gene expression data from the Open Targets Genetics portal (Ghoussaini, 2021). SNPs that were not cis-eQTLs were investigated further by identifying whether they were (or were in LD at r2 = 0.8 with) missense variants. To query if variants were missense or in LD with missense variants, we used the functional consequence data from Open Targets Genetics (Ghoussaini, 2021) (which used gnomAD v2 for variant effect prediction annotation, Lek, 2016). The reasoning was, if missense variants also had effects on corresponding gene expression, the causal inference using the missense variants as genetic instruments was unlikely to be biased even if the effect estimates were invalid. Where cis-pQTLs were not cis-eQTLs and were missense variants (or in LD with missense variants at r2 = 0.8) affecting the respective genes, these proteins were flagged and excluded from any further downstream analyses on the basis that the missense variant(s) might influence aptamer binding and produce biased effect estimates. Where cis-pQTLs were also cis-eQTLs and were missense variants (or in LD with missense variants) for the respective genes, although the effect estimates would not be valid, the causal inference using the instruments is unlikely to be biased; hence, these variants were retained in supplementary files and estimates of probes represented by these variants were flagged (using an asterisk) in the main figures. The rest, where cis-pQTLs had an effect on gene expression but were not missense variants or in LD with missense variants, were included in all analyses and presented without restrictions. Recombinant protein production Request a detailed protocol Recombinant human receptors and SARS-CoV-2 spike protein extracellular domains were expressed and purified as previously described (Shilts et al., 2021). Briefly, the full extracellular domain sequences of each were expressed as soluble secreted proteins in HEK293 cells. All proteins were affinity-purified using their hexahistidine tags. For biotinylated proteins, co-transfection of secreted BirA ligase in the presence of 100 µM D-biotin resulted in the covalent addition of a biotin group to an acceptor peptide tag, also as described previously (Kerr and Wright, 2012). The extracellular domain of CD209 (Q9NNX6) was defined as beginning at Pro114, while the full cDNA sequence was acquired from OriGene (#SC304915). Plate-based protein binding assay Request a detailed protocol The binding of biotinylated human receptor extracellular domains to pentameric SARS-CoV-2 spike protein was measured using the avidity-based extracellular interaction assay (AVEXIS) as previously described (Bushell et al., 2008). Briefly, the wells of a streptavidin-coated 96-well plate were saturated with biotinylated bait of either CD209, ACE2, or a previously described negative-control construct consisting only of the C-terminal protein tags shared by all other recombinant proteins (rat Cd4(d3 +4)-linker-Bio-6xHis) (Voulgaraki, 2005; Galaway and Wright, 2020). these we applied a of the full SARS-CoV-2 spike protein extracellular domain by a peptide sequence from the protein with a beta binding was measured by of a was by light at receptor binding assay Request a detailed protocol HEK293 cells were as described previously with expression encoding cDNA of CD209 or a the expression recombinant biotinylated spike protein was to as previously described et al., 2018). were with of spike or a control construct of protein tags on a flow as previously described (Shilts et al., 2021). HEK293 cell were provided by Research were first by All cell were for by and found to be all These cell are not by as availability Request a detailed protocol used to summary statistics are provided in (EBISPOT, 2020). for pan- and cis-MR analyses are provided on the GSMR for genetic colocalisation analyses are provided on the coloc GitHub 2021). All used in the to are provided in at 2021). Results and cis-MR analyses the of circulating ABO protein concentrations and soluble IL-6R in COVID-19 risk Our MR analysis used both genetic variants from across the genome and genetic variants near or in the gene encoding the relevant protein to associations of genetically predicted plasma protein concentrations with the risk of COVID-19 The COVID-19 are provided in Supplementary file 1. Although the pan-MR analysis leveraged genetic data from both and trans-acting a of from across the genome by HEIDI-outlier for some protein-COVID-19 pairs that were associated at 5% the associations with COVID-19 outcomes were driven by trans-acting or cis-acting genetic instruments. For example, although proteins were represented by both and trans-acting genetic and were represented only by cis-acting variants and one was driven entirely by trans-acting instruments ABO file the pan-MR analysis revealed protein probes associated with COVID outcomes at an FDR of 5% The selected by GSMR to represent these probes were also cis-eQTLs curated for the Open Targets Genetics portal Open Targets Genetics, and, the ABO signal be to as a that a nucleotide in the of were not missense variants or in LD with missense variants file the that SNPs with associations with proteins were used as genetic instruments for the of significant pan-MR 1 Open the of randomisation and cis-MR and genetic pan- and cis-MR methods used (Sun et al., 2018) as the of genetic instruments and the UK Biobank individual genotype data as reference We selected near-independent genetic instruments and performed MR analysis using generalised summary data-based Mendelian randomisation that for residual correlation between instruments. Genetic colocalisation analysis was used to posterior of shared causal genetic signal between protein and posterior probability of shared causal genetic signal of more than (i.e. a or posterior probability for 4 was used as evidence of genetic The line analysis the from target the three proteins with pan-MR evidence of association with COVID also had cis-MR evidence at cis-MR While the pan-MR analysis used genetic data from across the the cis-MR analysis genetic instrument to near 1 of or in the gene encoding the protein. proteins with pan-MR associations were supported by corresponding cis-MR associations and Supplementary file ABO, and these only ABO and IL-6R proteins had some evidence of genetic colocalisation with posterior (PP.H4) more than and of a shared genetic signal between protein and COVID-19 phenotype Although the of IL-6R was it had a = a signal of the IL-6R protein with the COVID-19 is a more likely than the association driven by independent Open associations of genetically predicted plasma protein concentrations with selected COVID-19 phenotypes. The estimates represent of COVID-19 per standard deviation (SD) of genetically predicted protein abundance using genetic instruments from across the genome randomisation The estimates represent of COVID per of genetically predicted protein abundance using genetic instruments near or in the gene encoding the protein represent The of the are to the of the of the For each COVID pan-MR associations at FDR 5% were a COVID phenotype a pQTL and the number of in the COVID phenotype the number of SNPs used as genetic instruments for the protein the posterior probability that protein and COVID traits the posterior probability evidence for vs. against shared causal variants and the candidate colocalising signal proteins that have that are either missense variants or in linkage disequilibrium with missense variants, their effect estimates potentially predicted ABO was associated with risk in out of seven COVID-19 outcomes These outcomes represented both susceptibility (e.g. COVID-19 vs. cis-MR per genetically predicted ABO × and severity (e.g. hospitalised COVID-19 vs. cis-MR p=1 × of COVID-19. predicted soluble IL-6R was only associated with higher risk of hospitalised COVID-19 compared to per genetically predicted × the SNPs involved in the pan-MR associations of the all probes IL-6R and ABO had at one trans-acting and in all these at one of the trans-acting SNPs were to the ABO gene by the Open Targets Genetics V2G the of the ABO genetic Furthermore, when the of pan-MR associations of these probes across all seven COVID-19 outcomes, the protein probes that have trans-acting ABO SNPs a similar association as the ABO protein associated with only COVID-19 outcomes that have file 1 of proteins reported in our study and the of evidence their by by by and single-SNP of cis-acting of trans-acting

  • Research Article
  • Cite Count Icon 26
  • 10.1111/j.1755-148x.2009.00622.x
Genome‐wide associations studies for melanoma and nevi
  • Aug 26, 2009
  • Pigment Cell & Melanoma Research
  • Iwei Yeh + 1 more

Familial melanoma accounts for approximately 10% of cases. Thus far, causative genetic mutations have been identified through linkage studies, which typically find high penetrance, low frequency genetic variants. Approximately 40% of affected families have CDKN2A mutations and a small number of families carry CDK4 mutations. The genetic basis of the remainder of familial cases is unknown. Linkage studies suggest a susceptibility locus on the short arm of chromosome 1, and there may be many alleles that increase risk by a small amount. High frequency alleles with small effects on melanoma risk in European populations have been identified in MC1R (melanocortin 1 receptor), ASIP (agouti signaling protein), TYR (tyrosinase), and TYRP (tyrosinase-related protein). These associations were identified via directed investigation of polymorphisms in genes known to be involved in pigmentation. MC1R variants have been shown to contribute to melanoma risk, even beyond their marked effect on pigmentation phenotype. In contrast, other variants associated with freckling and sun sensitivity have not found to be associated with melanoma thus far in association studies (SLC24A4, KITLG, and OCA2). The GenoMEL consortium performed a genome-wide association study (GWAS) to look broadly for associations of common, low-penetrance genetic variations with melanoma (Bishop et al. 2009). rs258322, the single nucleotide polymorphism (SNP) with the highest association at 16q24 (which contains MC1R), was found to have a per-allele odds ratio (OR) of 1.67 for melanoma. This SNP was also found to be associated with hair color and pigmentation in a previous GWAS (Han et al. 2008). That study showed that the SNP's correlation with the two phenotypes was due to functional variants in MC1R that are in linkage disequilibrium (LD) with rs258322. Bishop et al. note the magnitude of the association for rs258322 with melanoma is similar to that recently described for MC1R variants and it is likely that the association at rs258322 is due to functional MC1R variants. In contrast, rs8059973 which is approximately 90 Kb from MC1R was found to have an independent association with melanoma; replication and fine mapping of this SNP must be completed before the significance of this association is understood. Bishop et al. also replicated previously suggested associations – one with a coding variant in TYR (tyrosinase) (OR = 1.27) and another in 20q11.22, near ASIP. In addition to hits near MC1R, TYR, and ASIP, Bishop et al. found a novel association for two SNPs in 9p21. The two SNPs were independently associated with melanoma, spanning the region that contains CDKN2A. rs7023329 (OR = 1.18) is approximately 150 kB downstream from CDKN2A, within an intron of methylthioadenosine phosphorylase (MTAP). Methylthioadenosine phosphorylase is often co-deleted with CDKN2A, and plays a role in adenine and methionine salvage. Reduced expression of MTAP has been previously demonstrated in malignant melanoma, but the functional consequences are unclear. A second SNP in 9p21, rs1011970, is 87 kB upstream of CDKN2A and is within ANRIL, an antisense non-coding RNA that overlaps with the promoter of CDKN2A and the transcribed sequence of CDKN2B. As the study population was enriched for family history of melanoma, early age of onset, and multiple primaries, it is possible that a higher number of high penetrance CDKN2A mutations in the study population could contribute to this effect. As the CDKN2A mutation status of the patients carrying the associated nearby SNPs is not known, associations could be due to CDKN2A mutations arising on different genetic backgrounds. While rare, mutations in CDKN2A have very high penetrance, and the selection of patients could have pushed the expected frequency of CDKN2A mutations well above the 2% expected in an unselected cohort of melanoma patients. Alternatively, these signals could be due to high frequency low-risk alleles affecting CDKN2A function in a novel way [supported by a study showing melanoma risk associated with common variants in CDKN2A (Debniak et al., 2006)] or by variants in nearby genes. The number of melanocytic nevi is positively correlated with melanoma risk, more so than pigmentation and tanning response differences. Individuals at particularly increased risk are those with large or 'dysplastic nevi'. Increased nevus counts are also associated with increased UV exposure. Nevi frequently harbor mutations in BRAF and NRAS, the same as those found in melanoma, and a subset of melanomas arises within nevi. Previously, linkage studies in twins identified an association with nevus counts near the CDKN2A locus in addition to linkage with other loci (Falchi et al. 2006). A recent GWAS by Falchi et al., 2009 for nevus count using cohorts from the UK and Australia also found a strong association with SNPs at 9q21. The SNP with the highest association, rs4636294, is located in the 5′-UTR of MTAP. Overall, the signals within 9p21 overlapped with those found to be associated with melanoma in the study discussed previously. As the cohort for the nevus association study was not enriched for individuals with an increased melanoma risk, this finding supports the existence of low-penetrance variants affecting melanocytic proliferation, i.e., nevi and melanoma, in this region. An additional association with nevus count was identified within 22q13, with rs2284063 showing the lowest P-value. This SNP lies within an intron of PLA2G6, a member of the phospholipase A2 gene family. The associations identified accounted for <3% of the variance in nevus counts across all populations studied therefore many additional genetic factors likely remain to be discovered. After identifying associations with nevus count using one cohort, the authors tested for and found associations between these SNPs and melanoma in a separate case control sample. Factoring nevus count into their model reduced the risk attributed to each SNP. Thus, the risk of melanoma associated with the 9p21 and 22q13 loci is at least partially mediated via nevus count. Bishop et al. replicated the association with the SNPs at 22q13 with melanoma. Given the relationship between nevus count and melanoma and the associations of the SNPs identified in the nevus study with melanoma, the causative variants at 9p21 are quite possibly the same for both phenotypes. PLA2G6 is interesting due to known associations with lung cancer susceptibility and roles in cell growth and proliferation, but the association with nevus count was stronger for nearby imputed SNPs, some of which are not located within the transcript of PLA2G6 but closer to other genes. Thus, the causative variant may not act through PLA2G6. In contrast to the melanoma GWAS, the GWAS for nevus count did not identify a signal in the MC1R region, consistent with prior reports indicating that MC1R variants are associated with pigmentation phenotypes, freckling, and melanoma, but not with nevus counts. Given the current model where nevi are composed of melanocytes with UV induced genetic alterations that are held in check by senescence mechanisms (Mooi and Peeper, 2006), these findings raise the question: how does MC1R increase melanoma risk without having an apparent effect on nevus counts? As the melanoma risk conveyed through MC1R variation varies with the type of melanoma (Landi et al., 2006), future GWAS association studies stratified by histologic subtype or somatic mutations of the tumor may shed light on this interesting question.

Save Icon
Up Arrow
Open/Close