Structural variants in linkage disequilibrium with GWAS-significant SNPs
With the recent expansion of structural variant identification in the human genome, understanding the role of these impactful variants in disease architecture is critically important. Currently, a large proportion of genome-wide-significant genome-wide association study (GWAS) single nucleotide polymorphisms (SNPs) are functionally unresolved, raising the possibility that some of these SNPs are associated with disease through linkage disequilibrium with causal structural variants. Hence, understanding the linkage disequilibrium between newly discovered structural variants and statistically significant SNPs may provide a resource for further investigation into disease-associated regions in the genome. Here we present a resource cataloging structural variant-significant SNP pairs in high linkage disequilibrium. The database is composed of (i) SNPs that have exhibited genome-wide significant association with traits, primarily disease phenotypes, (ii) newly released structural variants (SVs), and (iii) linkage disequilibrium values calculated from unphased data. All data files including those detailing SV and GWAS SNP associations and results of GWAS-SNP-SV pairs are available at the SV-SNP LD Database and can be accessed at ‵https://github.com/hliang-SchrodiLab/SV_SNPs. Our analysis results represent a useful fine mapping tool for interrogating SVs in linkage disequilibrium with disease-associated SNPs. We anticipate that this resource may play an important role in subsequent studies which investigate incorporating disease causing SVs into disease risk prediction models.
- # Single Nucleotide Polymorphisms
- # Linkage Disequilibrium
- # Structural Variants
- # Disease Risk Prediction Models
- # Disease-associated Single Nucleotide Polymorphisms
- # Variants In Linkage Disequilibrium
- # Disease-associated Regions
- # Unphased Data
- # High Linkage Disequilibrium
- # Genome-wide Association Study
- Research Article
39
- 10.1016/j.ajhg.2021.02.006
- Feb 23, 2021
- The American Journal of Human Genetics
Integrative analysis of liver-specific non-coding regulatory SNPs associated with the risk of coronary artery disease
- Research Article
47
- 10.1261/rna.029900.111
- Nov 22, 2011
- RNA
A majority of SNPs (single nucleotide polymorphisms) map to noncoding and intergenic regions of the genome. Noncoding SNPs are often identified in genome-wide association studies (GWAS) as strongly associated with human disease. Two such disease-associated SNPs in the 5' UTR of the human FTL (Ferritin Light Chain) gene are predicted to alter the ensemble of structures adopted by the mRNA. High-accuracy single nucleotide resolution chemical mapping reveals that these SNPs result in substantial changes in the structural ensemble in agreement with the computational prediction. Furthermore six rescue mutations are correctly predicted to restore the mRNA to its wild-type ensemble. Our data confirm that the FTL 5' UTR is a "RiboSNitch," an RNA that changes structure if a particular disease-associated SNP is present. The structural change observed is analogous to that of a bacterial Riboswitch in that it likely regulates translation. These data further suggest that specific pairs of SNPs in high linkage disequilibrium (LD) will form RNA structure-stabilizing haplotypes (SSHs). We identified 484 SNP pairs that form SSHs in UTRs of the human genome, and in eight of the 10 SSH-containing transcripts, SNP pairs stabilize RNA protein binding sites. The ubiquitous nature of SSHs in the transcriptome suggests that certain haplotypes are conserved to avoid RiboSNitch formation.
- Research Article
- 10.1002/alz.082689
- Dec 1, 2023
- Alzheimer's & Dementia
BackgroundAlzheimer’s disease (AD) is up to 60‐80% heritable, but less than ∼20% is explained by studies analyzing single nucleotide variants (SNVs). One limitation of short‐read whole‐genome sequencing (SRS) is the standard read length of 150 base pairs, which does not enable the detection of longer structural variants (SVs). Here we sequenced individuals using long‐read sequencing (LRS), which can sequence reads with an average length of ∼20 kilobases, allowing us to identify SVs previously uncaptured.MethodAll participants underwent whole‐genome LRS (∼15x coverage) and a subgroup (84%) underwent SRS. Out of 576 participants (47% males, age = 70.6±7.9 y.o.), 115 were diagnosed with AD or mild cognitive impairment, 365 were healthy controls, and 96 were diagnosed with a synucleinopathy (either Parkinson or Lewy Body disease). Eighty‐three index SNVs from loci associated with AD risk through GWAS (Bellenguez et al., Nature Genetics 2022) were genotyped with 30x coverage SRS, in addition to APOE2‐4. A 1Mbp window was defined around these SNVs to construct the discovery range for SVs (Sniffles2 population mode). Linkage disequilibrium (LD) was assessed between SNVs and SVs (CubeX).ResultA total of 14854 SVs were found across the AD risk loci (Figure1). After LD calculation, N = 197 SVs had a R2>0.1 (Figure2). The SVs with the highest LD (R2>0.7) are reported in Table1, among which there is a 322 bp deletion in the 3’ UTR region of the TMEM106B in high LD (R2 = 0.918) with the intronic SNV rs13237518. This TMEM106B locus has been previously associated with the risk of frontotemporal lobar dementia with TDP‐43 pathology in addition to AD, but the causative SNV has not been identified yet. This large deletion may mediate TMEM106B’s risk‐modulating role in AD and FTLD‐TDP (Chemparathy et al. medRxiv 2023). At the complex MAPT locus, three SVs showed a high LD with rs199515.ConclusionInsights into the role of SVs in neurodegenerative disorders have been hampered due to limitations with SRS. Using LRS in a large AD‐related sample, we characterized for the first time the genetic variation of SVs in known AD risk loci and provide a roadmap to identify potential causal SVs driving the AD association signal.
- Abstract
- 10.1016/j.jalz.2019.06.4368
- Jul 1, 2019
- Alzheimer's & Dementia
HIGH-THROUGHPUT IDENTIFICATION OF NONCODING FUNCTIONAL SNPS
- Research Article
- 10.1101/2024.08.12.24311887
- Aug 13, 2024
- medRxiv : the preprint server for health sciences
Advances have led to a greater understanding of the genetics of Alzheimer's Disease (AD). However, the gap between the predicted and observed genetic heritability estimates when using single nucleotide polymorphisms (SNPs) and small indel data remains. Large genomic rearrangements, known as structural variants (SVs), have the potential to account for this missing genetic heritability. By leveraging data from two ongoing cohort studies of aging and dementia, the Religious Orders Study and Rush Memory and Aging Project (ROS/MAP), we performed genome-wide association analysis testing around 20,000 common SVs from 1,088 participants with whole genome sequencing (WGS) data. A range of Alzheimer's Disease and Related Disorders (AD/ADRD) clinical and pathologic traits were examined. Given the limited sample size, no genome-wide significant association was found, but we mapped SVs across 81 AD risk loci and discovered 22 SVs in linkage disequilibrium (LD) with GWAS lead variants and directly associated with AD/ADRD phenotypes (nominal P < 0.05). The strongest association was a deletion of an Alu element in the 3'UTR of the TMEM106B gene. This SV was in high LD with the respective AD GWAS locus and was associated with multiple AD/ADRD phenotypes, including tangle density, TDP-43, and cognitive resilience. The deletion of this element was also linked to lower TMEM106B protein abundance. We also found a 22 kb deletion associated with depression in ROSMAP and bearing similar association patterns as AD GWAS SNPs at the IQCK locus. In addition, genome-wide scans allowed the identification of 7 SVs, with no LD with SNPs and nominally associated with AD/ADRD traits. This result suggests potentially new ADRD risk loci not discoverable using SNP data. Among these findings, we highlight a 5.6 kb duplication of coding regions of the gene C1orf186 at chromosome 1 associated with indices of cognitive impairment, decline, and resilience. While further replication in independent datasets is needed to validate these findings, our results support the potential roles of common structural variations in the pathogenesis of AD/ADRD.
- Research Article
13
- 10.1186/s12864-022-08418-7
- Mar 9, 2022
- BMC Genomics
BackgroundStructural variants (SV) are causative for some prominent phenotypic traits of livestock as different comb types in chickens or color patterns in pigs. Their effects on production traits are also increasingly studied. Nevertheless, accurately calling SV remains challenging. It is therefore of interest, whether close-by single nucleotide polymorphisms (SNPs) are in strong linkage disequilibrium (LD) with SVs and can serve as markers. Literature comes to different conclusions on whether SVs are in LD to SNPs on the same level as SNPs to other SNPs. The present study aimed to generate a precise SV callset from whole-genome short-read sequencing (WGS) data for three commercial chicken populations and to evaluate LD patterns between the called SVs and surrounding SNPs. It is thereby the first study that assessed LD between SVs and SNPs in chickens.ResultsThe final callset consisted of 12,294,329 bivariate SNPs, 4,301 deletions (DEL), 224 duplications (DUP), 218 inversions (INV) and 117 translocation breakpoints (BND). While average LD between DELs and SNPs was at the same level as between SNPs and SNPs, LD between other SVs and SNPs was strongly reduced (DUP: 40%, INV: 27%, BND: 19% of between-SNP LD). A main factor for the reduced LD was the presence of local minor allele frequency differences, which accounted for 50% of the difference between SNP – SNP and DUP – SNP LD. This was potentially accompanied by lower genotyping accuracies for DUP, INV and BND compared with SNPs and DELs. An evaluation of the presence of tag SNPs (SNP in highest LD to the variant of interest) further revealed DELs to be slightly less tagged by WGS SNPs than WGS SNPs by other SNPs. This difference, however, was no longer present when reducing the pool of potential tag SNPs to SNPs located on four different chicken genotyping arrays.ConclusionsThe results implied that genomic variance due to DELs in the chicken populations studied can be captured by different SNP marker sets as good as variance from WGS SNPs, whereas separate SV calling might be advisable for DUP, INV, and BND effects.
- Research Article
4
- 10.1186/s12711-025-00962-6
- Mar 28, 2025
- Genetics Selection Evolution
BackgroundWhole genome sequencing (WGS), despite its advantages, is yet to replace methods for genotyping single nucleotide variants (SNVs) such as SNP arrays and targeted genotyping assays. Structural variants (SVs) have larger effects on traits than SNVs, but are more challenging to accurately genotype. Using low-coverage WGS with genotype imputation offers a cost-effective strategy to achieve genome-wide variant coverage, but is yet to be tested for SVs.MethodsHere, we investigate combined SNV and SV imputation with low-coverage WGS data in Atlantic salmon (Salmo salar). As the reference panel, we used genotypes for high-confidence SVs and SNVs for n = 365 wild individuals sampled from diverse populations. We also generated 15 × WGS data (n = 20 samples) for a commercial population external to the reference panel, and called SVs and SNVs with gold-standard approaches. An imputation method selected for its established performance using low-coverage sequencing data (GLIMPSE) was tested at WGS depths of 1 × , 2 × , 3 × , and 4 × for samples within and external to the reference panel.ResultsSNVs were imputed with high accuracy and recall across all WGS depths, including for samples out-with the reference panel. For SVs, we compared imputation based purely on linkage disequilibrium (LD) with SNVs, to that supplemented with SV genotype likelihoods (GLs) from low-coverage WGS. Including SV GLs increased imputation accuracy, but as a trade-off with recall, requiring 3–4 × depth for best performance. Combining strategies allowed us to capture 84% of the reference panel deletions with 87% accuracy at 1 × depth. We also show that SV length affects imputation performance, with provision of SV GLs greatly enhancing accuracy for the longest SVs in the dataset.ConclusionsThis study highlights the promise of reference panel imputation using low-coverage WGS, including novel opportunities to enhance the resolution of genome-wide association studies by capturing SVs.
- Research Article
58
- 10.1158/1055-9965.681.13.5
- May 1, 2004
- Cancer Epidemiology, Biomarkers & Prevention
SNPs, Haplotypes, and Cancer: Applications in Molecular Epidemiology
- Research Article
6
- 10.1111/j.2040-1124.2012.00222.x
- Jun 11, 2012
- Journal of diabetes investigation
We have witnessed great success in the identification of new type 2 diabetes susceptibility loci through genome-wide association (GWA) analysis in the past 5 years. The number of loci robustly implicated in type 2 diabetes risk; that is, those that have attained a genome-wide significance level (P < 5 × 10−8) and also have been repeatedly validated in independent samples, has climbed from just three in 2006 to >50 today. As these GWA studies were carried out almost exclusively in European-descent populations, studies in non-European populations will allow us to assess the relevance of the findings to other ethnic groups. To address this point, a consortium-based GWA meta-analysis of type 2 diabetes was recently carried out in East Asians with a multistage study design (involving >25,000 cases and >29,000 controls in total)1; eight new loci were confirmed to significantly associate with type 2 diabetes. This large-scale GWA meta-analysis was principally carried out in the Asian Genetic Epidemiology Network (AGEN), in which our group participated together with investigators from Japan, Korea, China, Taiwan, Singapore and the USA. Before the AGEN GWA meta-analysis, GWA studies in East Asian populations already reported several type 2 diabetes loci (e.g. KCNQ1, UBE2E2 and C2CD4A-C2CD4B), which had not been identified in European-decent populations until then. However, in the course of such efforts made in East Asians, it has been widely recognized in European-decent populations that for common complex diseases, such as type 2 diabetes, a major part of susceptibility loci can individually exert modest genetic impacts; that is, ∼10–20% increased risk of developing the disease, and hence might not attain a genome-wide significance level unless a large number of individuals are analyzed in both the discovery and replication (or follow up) stages of GWA scans. In this line, it is reported that the number of discovered variants (or loci) is strongly correlated with experimental sample size in GWA studies of type 2 diabetes2. This might particularly hold true in the case that effect sizes (measured as an odds ratio [OR] in case–control studies) of undiscovered variants are almost equivalent to those of discovered ones. In addition to effect sizes, effect allele frequencies (of target variants) have a substantial influence on our chance of identifying significant associations in a given size of samples. When inconsistent results are observed between populations, we should carefully claim the presence of ‘population specific’ genetic association with type 2 diabetes, as discussed later. Because GWA studies interrogate a huge number of single nucleotide polymorphism (SNP) markers simultaneously, a stringent statistical threshold (P < 5 × 10−8) should be set in order to avoid false positives from multiple testing. Based on the simulations, such a high significance level of association could be detected in the discovery stage (stage 1) of AGEN GWA meta-analysis (involving 6952 cases and 11,865 controls) for loci with OR ≥ 1.20 assuming 80% power. Indeed, three previously-identified loci –CDKAL1, CDKN2A/2B and KCNQ1– did satisfy the threshold in the discovery stage (OR = 1.17–1.21). This indicates that effect sizes for undiscovered (or new) type 2 diabetes loci are likely to be rather modest (OR < 1.2) and that a larger number of samples (for cases and controls) are required to attain P < 5 × 10−8, and also to confirm the associations independently. Here, it has to be kept in mind that in the GWA scan, the actual genetic effect is typically smaller than its estimate based on the discovery stage data, whose results are affected by an ascertainment bias known as the ‘winner’s curse’3. Another issue to be considered in the overall GWA meta-analysis design is the cost and time of SNP genotyping in the replication stage(s). Although it costs a great deal for the initial GWA genotyping, look-up of target SNPs in the existing GWA-scanned datasets does not require additional cost. In contrast, de novo genotyping in independent samples requires additional cost and a great deal of time. Thus, a combination of the two approaches was taken in the replication/follow-up stages of AGEN GWA meta-analysis1. That is, modest association signals (3756 SNPs from 297 independent loci, showing P < 5 × 10−4 in the discovery panel) were followed up with a stage 2 in silico replication analysis (involving 5843 cases and 4574 controls) and then 19 SNPs showing the most compelling evidence for association (P < 10−5 in stages 1 + 2) were subjected to stage 3 de novo genotyping (involving up to 12,284 cases and 13,172 controls). A total of eight new loci were finally confirmed to show significant evidence of association in East Asians. The replicated associations for a limited number of candidate gene loci have broadly shown the tendency of interethnic similarity. Even though the common (or cosmopolitan) effect of type 2 diabetes susceptibility variants is known for several loci, the extent to which the causation of the disease differs or overlaps between populations remains unknown. With in silico replication analysis, just two of eight loci (25%) discovered in East Asian samples also showed nominally (P < 0.05) significant associations in European-descent populations. For the remaining six loci without reproducible evidence of association, minor allele frequencies were relatively high (0.14–0.46) in European-descent populations. This suggests that failure of replication is not simply a result of cross-population differences in risk allele frequency; that is, it is estimated that low allele-frequency leads to reduced power, even though effect sizes are similar4. By systematically comparing effect sizes (in OR) between East Asians and Europeans for a number of robustly-confirmed type 2 diabetes loci, we can recognize several points highlighted by the extensive GWA meta-analysis of East Asian samples (Figure 1). First, for all eight loci newly discovered in East Asians, effect sizes are modest (OR = 1.08–1.13). Second, approximately one-third (16 of 49 loci included in the plot) of the loci exert a certain level of effect size (OR > 1.1) in both ethnic groups. Here, one of the discovered loci, MAEA, can be categorized to this group, and its minor allele frequency is far less frequent in Europeans than in East Asians (3% vs 42%), similar to the situation for KCNQ14. Third, a number of type 2 diabetes loci that were originally identified in Europeans do not appear to show reproducible associations in East Asians. Roughly, the failure to replicate a nominally significant association between an index SNP and type 2 diabetes in a tested population does not necessarily indicate the absence of an association at the relevant locus. Factors to be considered include sample size, linkage disequilibrium (LD) structure, and the potential impact of gene–gene and gene–environment interactions in the individual population. Despite such caveats, it seems to be helpful to know the extent to which type 2 diabetes associations are reproducible between populations. Correlation of effect sizes for type 2 diabetes risk between East Asians (x-axis) and Europeans (y-axis) at 49 single nucleotide polymorphism loci, for which the corresponding data are available in the East Asian genome-wide association (GWA) meta-analysis1. Only the effect size estimate, odds ratios (OR), but not 95% confidence interval, is shown in the figure for the purpose of readability. Symbols colored in red and blue are loci originally reported in GWA studies/meta-analysis (GWAS) of East Asians and Europeans, respectively. The gene names are shown for eight loci newly discovered in East Asian samples, as well as three others –TCF7L2, KCNQ1 and PRC1. AGEN, Asian Genetic Epidemiology Network. Together with the recently reported data on South Asian meta-analysis, which involves studies in India, Pakistan, the UK, Mauritius, Singapore and Sri Lanka5, an arbitrary picture of cross-population difference (or overlapping) can be shown in Figure 2, as for a total of 39 loci, which have been originally identified in European GWA studies/meta-analysis and then subjected to evaluation (i.e. replication) in populations of East and South Asian descent. Almost half of the loci (20 out of 39) are common across three ethnic groups – Europeans, East Asians and South Asians – and 18% of the loci do not show a nominally significant association in non-European populations. Again, this cannot exclude the possibility that the causal variants are not represented by the index SNPs because of cross-population differences in LD structure and/or the possibility that sample size is insufficient to attain statistical significance. Nevertheless, it is possible that some of the negative associations reflect true population-specificity. A schematic representation of cross-population difference (or overlapping) for a total of 39 type 2 diabetes-associated loci that have been originally identified in European genome-wide association studies/meta-analysis, and then subjected to evaluation in populations of East and South Asian descent. Here, an associated locus is assumed to overlap between the ethnic groups when P ≤ 0.05, showing a concordant direction of genetic effect in the population tested for replication. Among eight loci reaching genome-wide significance, the GLIS locus alone has previously been reported in the context of metabolic traits or related diseases. This gene encodes a Kruppel-like zinc finger transcription factor, which has been proposed as a key player in the regulation of pancreatic β-cell development and insulin gene expression. In accordance with such a biological function, SNPs in high LD with this locus have been implicated in association with type 1 diabetes and fasting plasma glucose levels. As the number of discovered loci has recently increased, substantial efforts have been made to show that the genes mapping close to type 2 diabetes loci are enriched for particular biological pathways (or processes), although they have thus far met with only limited success. Among the pathways, the most robust finding is related to cell-cycle regulation; this seems to be consistent with a model in which the regulation of pancreatic islet mass is a principal component of genetic susceptibility to type 2 diabetes, as is shown for GLIS above-mentioned. Despite the lack of clear physiological evidence on type 2 diabetes, these GWA findings can provide clues to the precise biological mechanisms underlying appreciable differences in the clinical presentation of type 2 diabetes between populations of European and non-European origin. Along with the GWA meta-analyses in individual populations, ‘transethnic’ meta-analysis is currently being carried out and will allow for a better chance to show novel susceptibility loci and pathophysiological pathways of type 2 diabetes, and might also facilitate the fine mapping of common causal variants by utilizing ethnic differences in LD structure.
- Research Article
43
- 10.1093/biostatistics/kxp043
- Oct 12, 2009
- Biostatistics
Genome-wide association studies (GWAS) are increasingly utilized for identifying novel susceptible genetic variants for complex traits, but there is little consensus on analysis methods for such data. Most commonly used methods include single single nucleotide polymorphism (SNP) analysis or haplotype analysis with Bonferroni correction for multiple comparisons. Since the SNPs in typical GWAS are often in linkage disequilibrium (LD), at least locally, Bonferroni correction of multiple comparisons often leads to conservative error control and therefore lower statistical power. In this paper, we propose a hidden Markov random field model (HMRF) for GWAS analysis based on a weighted LD graph built from the prior LD information among the SNPs and an efficient iterative conditional mode algorithm for estimating the model parameters. This model effectively utilizes the LD information in calculating the posterior probability that an SNP is associated with the disease. These posterior probabilities can then be used to define a false discovery controlling procedure in order to select the disease-associated SNPs. Simulation studies demonstrated the potential gain in power over single SNP analysis. The proposed method is especially effective in identifying SNPs with borderline significance at the single-marker level that nonetheless are in high LD with significant SNPs. In addition, by simultaneously considering the SNPs in LD, the proposed method can also help to reduce the number of false identifications of disease-associated SNPs. We demonstrate the application of the proposed HMRF model using data from a case-control GWAS of neuroblastoma and identify 1 new SNP that is potentially associated with neuroblastoma.
- Research Article
360
- 10.1016/s0140-6736(08)60208-1
- Feb 1, 2008
- Lancet (London, England)
LDL-cholesterol concentrations: a genome-wide association study
- Research Article
345
- 10.1086/316944
- Jan 1, 2001
- The American Journal of Human Genetics
Extent and Distribution of Linkage Disequilibrium in Three Genomic Regions
- Front Matter
11
- 10.1053/j.gastro.2014.02.023
- Feb 22, 2014
- Gastroenterology
IBD Genetics: Focus on (Dys) Regulation in Immune Cells and the Epithelium
- Front Matter
8
- 10.1053/j.gastro.2016.04.021
- Apr 29, 2016
- Gastroenterology
The Hunting of the Snark: Whither Genome-Wide Association Studies for Colorectal Cancer?
- Research Article
- 10.1097/01.hs9.0000850712.14894.c0
- Jun 23, 2022
- HemaSphere
Background: Multiple myeloma (MM) is the second most common blood malignancy, caused by an uncontrolled growth of plasma cells in the bone marrow, accounting for 20% of all newly diagnosed hematological cancers. Although the current 5-year survival rate is ranging between 40-60%, MM is still considered an incurable disease since most of the patients eventually relapse. While the causes of MM are incompletely understood, several genome-wide association studies (GWAS) have been conducted to identify germline variants that predispose to MM. Up to date, a total of 24 loci were found to be associated with MM risk, but very little information is available about their functional role. Aims: The principal goal is to explore in silico the function of the germline variants associated with MM risk. As we do not know if the GWAS-identified SNPs are the causal variants or just markers of risk, the functional characterization of the causal risk variants would lead to a better understanding of disease development. Methods: GWAS design takes advantage of the linkage disequilibrium (LD) structure of the human genome, thus the main GWAS findings are single-nucleotide polymorphisms (SNPs) that show the strongest association with MM risk (measured as the lowest p-values), but they are not necessarily the functionally causal variants. In this project, we used bioinformatics tools (GTEx, HaploReg v4.1, Roadmap, LDlinke, SNPnexus, RegulomeDB 2.0.3, SNP2TFBS, miRNASNP v3, GeneMANIA) to perform fine mapping of all GWAS-identified loci and to prioritize in each locus the polymorphism with the highest chance of being functionally relevant. In particular, we focused on the loci with the smallest number of SNPs in high LD (r2>0.8) in order to maximize the probability to capture the casual variant. Results: Four of the 24 MM risk loci had a relatively small number of SNPs in high LD and within them we found that the locus located at chromosome 16 contained the greatest number of functionally annotated SNPs. Of particular functional interest was rs3747481 (chr16:30666367 C/T) due to the following reasons: it is a missense variant (protein change: P359L), it is located in the PRR14 gene, that contributes to chromatin hierarchical organization and has a role in gene regulation, has a high CADD PHRED score (22.1). Additionally, according to GTEx portal, rs3747481 is associated with the expression level of the RNF40 gene (which plays a central role in histone code and gene regulation) in whole blood cells (p=3.02-12). It has a score of “1d” in RegulomeDB (eQTL+ TF binding + any motif + DNase peak), meaning that this variant has a high likelihood to affect binding of transcription factors. Some other SNPs (like rs35629860 and rs67128646) in the same LD block show the co-occurrence of H3K4me3 and H3K27me3 histone marks (associated with gene activation and repression, respectively) in promoters and enhancers in B lymphocytes. rs6565197 is predicted to affect the binding of the KLF4 and KLF5 transcription factors which play key roles in cell cycle regulation. Summary/Conclusion: Through a fine mapping of MM risk loci by bioinformatics tools, we found a variant in the locus 16p11.2 that shows in silico a very high probability to have biological role in the risk disease.