Heuristic Methods for Finding Pathogenic Variants in Gene Coding Sequences
These are exciting times, with a plethora of new technologies that are expediting discovery of the genetic underpinnings of human disease. Comprehensive resequencing of the human genome is now feasible and affordable, allowing each person's entire genetic makeup to be revealed. The major focus of
- Research Article
24
- 10.1074/mcp.m500024-mcp200
- Aug 1, 2005
- Molecular & Cellular Proteomics
Type 2 diabetes mellitus is a complex disorder with a strong genetic component. Inherited complex disease susceptibility in humans is most commonly associated with single nucleotide polymorphisms. The mechanisms by which this occurs are still poorly understood. Here we focus on analyzing the effect of a set of disease-causing missense variations of the monogenetic form of Type 2 diabetes mellitus and a set of disease-associated nonsynonymous variations in comparison with that of nonsynonymous variations without any experimental evidence for association with any disease. Analysis of different properties such as evolutionary conservation status, solvent accessibility, secondary structure, etc. suggests that disease-causing variations are associated with extreme changes in the value of the parameters relating to evolutionary conservation and/or protein stability. Disease-associated variations are rather moderately conserved and have a milder effect on protein function and stability. The majority of the genes harboring these variations are clustered in or near the insulin signaling network. Most of these variations are identified as potential sites for post-translational modifications; certain predictions have already reported experimental evidence. Overall our results indicate that Type 2 diabetes mellitus may result from a large number of single nucleotide polymorphisms that impair modular domain function and post-translational modifications involved in signaling. Our emphasis is more on conserved corresponding residues than the variation alone. We believe that the approach of considering a stretch of peptide sequence involving a polymorphism would be a better method of defining the role of the polymorphism in the manifestation of this disease. Because most of the variations associated with the disease are rare, we hypothesize that this disease is a "mosaic model" of interaction between a large number of rare alleles and a small number of common alleles along with the environment, which is little contrary to the existing common disease common variant model.
- Research Article
5
- 10.1007/s10897-014-9737-0
- Jun 24, 2014
- Journal of genetic counseling
Genetic counseling in direct-to-consumer exome sequencing: a case report.
- Research Article
13
- 10.1016/j.jtumed.2022.04.014
- May 13, 2022
- Journal of Taibah University Medical Sciences
In silico analysis of missense variants of the C1qA gene related to infection and autoimmune diseases
- Research Article
15
- 10.1161/strokeaha.119.024158
- Feb 12, 2020
- Stroke
Effects of Genetic Variants on Stroke Risk.
- Peer Review Report
- 10.7554/elife.67474.sa1
- Apr 26, 2021
Article Figures and data Abstract Editor's evaluation Introduction Results and discussion Methods Data availability References Decision letter Author response Article and author information Abstract We develop integrated co-evolution and dynamic coupling (ICDC) approach to identify, mutate, and assess distal sites to modulate function. We validate the approach first by analyzing the existing mutational fitness data of TEM-1 β-lactamase and show that allosteric positions co-evolved and dynamically coupled with the active site significantly modulate function. We further apply ICDC approach to identify positions and their mutations that can modulate binding affinity in a lectin, cyanovirin-N (CV-N), that selectively binds to dimannose, and predict binding energies of its variants through Adaptive BP-Dock. Computational and experimental analyses reveal that binding enhancing mutants identified by ICDC impact the dynamics of the binding pocket, and show that rigidification of the binding residues compensates for the entropic cost of binding. This work suggests a mechanism by which distal mutations modulate function through dynamic allostery and provides a blueprint to identify candidates for mutagenesis in order to optimize protein function. Editor's evaluation A computational approach is proposed to identify mutations in enzymes that might impact their interactions with substrates. For one enzyme, in particular, the predictions are validated through experiments, using multiple techniques. Taken together, these data lead to non-trivial conclusions in regard to the nature of allosteric effects, albeit it remains unclear whether these conclusions will apply more broadly when other enzymes are examined. https://doi.org/10.7554/eLife.67474.sa0 Decision letter eLife's review process Introduction The evolutionary history of a protein comprises the ensemble of mutations acquired during the course of its evolutionary trajectory across different species, and contains valuable information on which residue positions contribute the most to a given protein's 3D-fold and function based on their conservation (Campbell et al., 2016; Rivoire et al., 2016; Yang et al., 2016). Furthermore, the subset of positions that are co-evolved (i.e., correlated mutational sites) provide clues on specific, native-state interactions. Pairwise residue contacts inferred from co-evolved positions within a protein family can be used as distance restraints to accurately model 3D structures (de Juan et al., 2013; Hopf et al., 2019; Kamisetty et al., 2013; Kim et al., 2014; Tripathi et al., 2015). Recent revolutionary successes in accurate predictions of 3D protein structures combine these methods with machine learning strategies, that is, deep learning (Jumper et al., 2021; Wang et al., 2016; Xu, 2019). Co-evolved positions also embed information on protein function, for example, revealing how factors such as binding affinity and specificity are modulated across evolutionary history and species (Rivoire et al., 2016; Salinas and Ranganathan, 2018; Torgeson et al., 2022). However, accessing, interpreting, and applying this information in a predictive manner is very challenging; mutations observed in the evolutionary history are often distal from the functional sites, implying that protein dynamics are responsible for their effects on function and that these sites act as distal allosteric regulators of function (Campitelli et al., 2020a; Modi et al., 2021a; Romero and Arnold, 2009; Salinas and Ranganathan, 2018; Tokuriki et al., 2012; Torgeson et al., 2022; Wei et al., 2016). Molecular dynamics (MD) simulations can capture protein dynamics and reveal the impact of distal mutations on function (Bowman and Geissler, 2012; Campbell et al., 2016; Campitelli et al., 2020a; Jiménez-Osés et al., 2014; Kolbaba-Kartchner et al., 2021; Modi et al., 2021a; Yang et al., 2016). However, the computational cost of MD simulations of sufficient length can be prohibitively high; further, it's often far from straightforward to forge a clear connection to function. To bridge this gap, we developed a framework to quickly evaluate MD trajectories and identify the sensitivity of a given position to mutation based on its intrinsic flexibility, which we assess using our dynamic flexibility index (DFI) metric, and on its dynamic coupling with functionally critical positions assessed by dynamic coupling index (DCI) (Campitelli et al., 2018; Gerek and Ozkan, 2011; Kumar et al., 2015b; Larrimore et al., 2017). DFI measures the resilience of a position by computing the total fluctuation response and thus captures the flexibility/rigidity of a given position. Applying DFI to several systems, we showed that rigid positions such as hinge sites contribute the most to equilibrium dynamics, and that mutations at hinge sites significantly impact function regardless of the distance from active sites (Kim et al., 2015; Kolbaba-Kartchner et al., 2021; Modi et al., 2021b, Modi et al., 2018; Modi and Ozkan, 2018; Zou et al., 2021; Zou et al., 2015). DCI measures the dynamic coupling between residue pairs and thus identifies positions most strongly coupled to active/binding sites; these positions point to possible allosteric regulation sites important for modulating function in adaptation and evolution (Butler et al., 2015; Modi et al., 2021a, Campitelli et al., 2021; Kuriyan and Eisenberg, 2007; Lu and Liang, 2009; Modi and Ozkan, 2018; Ose et al., 2020; Risso et al., 2018; Wodak et al., 2019). In this paper, we present integrated co-evolution and dynamic coupling (ICDC) approach to identify distal allosteric sites, and to assess and predict the effects of mutations on these sites on function. We propose a system to classify residue positions in a binary fashion based on co-evolution (co-evolved, 1 or not, 0) and dynamic coupling by DFI and DCI (dynamically coupled 1, or not, 0) with the functionally critical sites. This classification captures the complementarity of dynamics-based and sequence-based methods. We hypothesize that positions belonging to category (1,1), that is, positions both co-evolved and dynamically coupled with the functional sites, will have the largest effect on function. We validate our hypothesis first by analyzing the existing mutational fitness data for TEM-1 β-lactamase, available for every position of the protein (Stiffler et al., 2015). In agreement with our hypothesis, we find that mutations on category (1,1) positions significantly modulate the function. A large fraction of mutations enhancing enzymatic activity correspond to category (1,1) irrespective of distance from the active site. Second, we apply our ICDC approach to blindly predict and experimentally validate mutations that allosterically modulate dimannose binding in a natural lectin, cyanovirin-N (CV-N). CV-N binds dimannose with nanomolar affinity and remarkable specificity (Barrientos et al., 2003; Botos and Wlodawer, 2005; Botos and Wlodawer, 2003; Mori and Boyd, 2001; O'Keefe et al., 2003). It is part of the CV-N family, found in a wide range of organisms including cyanobacterium, ascomycetous fungi, and fern (Koharudin et al., 2008; Koharudin and Gronenborn, 2013; Patsalo et al., 2011; Percudani et al., 2005; Qi et al., 2009). While the 3D folds is remarkably conserved in all experimentally characterized members, the affinity and specificity for different glycans and, in particular, to dimannose varies significantly (Koharudin et al., 2009; Koharudin et al., 2008; Matei et al., 2016; Woodrum et al., 2013). To design CV-N variants with improved binding affinities for dimannose based on distal allosteric coupling, we binned each position in one of the four categories based on computed DFI, DCI, and co-evolution rates. We explored mutations at these sites based on frequency in the sequence alignment. After obtaining the mutant models through MD simulations, we assessed the impact of each naturally observed mutation on binding affinity by docking dimannose to the mutant models via Adaptive BP-Dock (Bolia et al., 2014a; Bolia et al., 2014b; Bolia and Ozkan, 2016). We chose position I34, which belongs to category (1,1) and is 16 Å away from the binding pocket, for experimental validation. We found that mutations I34K/L/Y had a diverse effect on glycan binding, either improving by twofold or abolishing completely. Through experimental and MD studies we show that the observed improvement in binding affinity is due to changes in the dynamics of residues in the binding pocket; mutation I34Y leads to rigidification of binding sites, thus compensating the entropic cost of binding (Breiten et al., 2013; Chodera and Mobley, 2013; Cornish-Bowden, 2002; Fox et al., 2018). Mutations at an additional position (A71T/S) from category (1,1) showed evidence of the same allosteric mechanism governing the modulation of binding dynamics. Overall, this study provides not only a new approach to identify distal sites whose mutations modulate binding affinity, but also sheds light into mechanistic insights on how distal mutations modulate binding affinity through dynamics allostery. Results and discussion Combining long-range dynamic coupling analysis with co-evolution allows to identify distal sites that contribute to functional activity With our ICDC approach, we aim to explore the role of dynamics versus evolutionary coupling (EC) as well as the role of rigidity versus flexibility in allosterically modulating active/binding site dynamics. To this extent, we created four unique categories that classify residue positions based on residue DFI score, DCI score, and co-evolutionary score: category (1,1) is dynamically and co-evolutionarily coupled rigid sites (exhibiting %DFI values 0.2 or lower, showing 0.7 or higher %DCI with the binding site, and showing 0.6 or higher co-evolution scores with the binding site); category (1,0) is dynamically coupled but co-evolutionarily not coupled sites; category (0,1) is dynamically not coupled but co-evolutionarily coupled sites; category (0,0) is dynamically not coupled, and co-evolutionarily not coupled flexible sites (exhibiting %DFI values 0.7 or higher) (Supplementary file 1 and Supplementary file 2; ); importantly, this classification is based on two independent statistical approaches thus compensate the noise of individual approaches. Based on our evolutionary analysis (Campitelli et al., 2020a; Modi et al., 2021b; Modi and Ozkan, 2018), we hypothesize that category (1,1) would impact protein activity or binding affinity the most. To test our hypothesis, we first analyzed the deep mutational scanning data available for the TEM-1 β-lactamase, correlating changes in ampicillin degradation activity (e.g., MIC values) with mutations to all possible amino acids at each position (Stiffler et al., 2015). The experimental results showed that amino acid substitutions at the catalytic site residues of TEM-1 negatively impacted activity. Mutations at other positions also affected activity; while most mutations were deleterious, surprisingly, others resulted in increased activity. The impact of mutations on dynamics and function of TEM-1 have been heavily explored but the distal mutational effects are still poorly understood (Kolbaba-Kartchner et al., 2021; Modi et al., 2021b; Modi and Ozkan, 2018; Salverda et al., 2010; Schneider et al., 2021; Stiffler et al., 2015; Thomas et al., 2010; Zimmerman et al., 2017; Zou et al., 2015). We applied our approach by obtaining DFI, DCI, and co-evolution scores for every position of TEM-1 and binning residue positions into each ICDC category (Supplementary file 1 and Supplementary file 5). We constructed fitness distributions for each category using the experimentally measured single mutant relative fitness values for all mutations per position provided in the dataset (Figure 1). Figure 1 Download asset Open asset Integrated co-evolution and dynamic coupling (ICDC) categories based on the dynamics and co-evolutionary analyses applied on TEM-1 β-lactamase. (A) The distributions in the form of violin plots are obtained for each ICDC category using all available experimental mutational data (Stiffler et al., 2015). (B) Violin plots showing the fitness values for amino acid substitutions observed in the natural sequences. (C) The category (1,1) positions are mapped on 3D structure. The catalytic site residues are shown in dark gray whereas category (1,1) positions are shown in magenta color. The function altering category (1,1) positions are widely distributed over the 3D structure. We found that category (1,1) positions show the highest impact, both significantly enhancing and reducing ampicillin degradation by TEM-1 (Figure 1A&C). In addition, category (0,0) residue mutations (i.e., the exact opposite of category (1,1)) lie within the neutral-like activity range defined by Stiffler et al., 2015, suggesting that mutations on positions that neither co-evolve nor dynamically couple to active site do not affect the function significantly. Category (1,0) residues enhance activity more than those in the neutral category (0,0). Mutations in category (0,1) positions also modulate function in both positive and negative direction, albeit not as strongly as those in category (1,1). However, mutations that negatively impact activity are conspicuously under-represented in the multiple sequence alignment (MSA) of native sequences (Figure 1B), particularly in category (1,1). This finding implies nature mostly allows mutations that don't compromise fold and function: Negative selection (i.e., elimination of amino acid types that are detrimental to the folding) is a major force in shaping the mutational landscape (Jana et al., 2014; Modi et al., 2021a; Morcos, 2020; Morcos et al., 2014; Morcos et al., 2013). Thus, the use of conservation information from MSA is a useful tool in eliminating deleterious amino acid substitutions in protein design. Our ICDC selection criteria effectively identifies residue positions and their amino acid substitutions that could fine-tune function without leading to a functional loss; and category (1,1) residues have the largest impact on function irrespective of their distance from active site (Figure 1C). Application of ICDC approach to modulate CV-N binding affinity through distal mutations CV-N is a small (11 kDa) natural lectin isolated from cyanobacterium Nostoc ellipsosporum which comprises two quasi-symmetric domains, A (residues 1–38/90–101) and B (residues 39–89 respectively), that are connected to each other by a short helical linker. Despite almost having identical structures, the domains show relatively low sequence homology (28% sequence identity and 52% similarity). Functionally, they both bind dimannose, yet the affinity is quite different, with domain B having tighter binding affinity (Kd = 15.3 µM), and domain A showing weak affinity (Kd = 400 µM) (Balzarini, 2007; Bolmstedt et al., 2001; Li et al., 2015). To simplify our analyses, we used a designed CV-N variant, P51G-m4, that contains a single high-affinity dimannose binding site (domain B), folds exclusively as a monomer in physiological conditions, and is more stable to thermal denaturation than wild type (Fromme et al., 2008; Fromme et al., 2007). The binding pocket of domain B of CV-N has been subjected to intense scrutiny to glean information on the origin of its binding specificity for dimannose (Bewley, 2001; Bolia et al., 2014b; Botos and Wlodawer, 2003; Li et al., 2015; Vorontsov and Miyashita, 2009). Previous mutational studies on the binding pocket residues have shown their importance in modulating interaction with dimannose (Barrientos et al., 2006; Bolia et al., 2014b; Chang and Bewley, 2002; Matei et al., 2008). All known substitutions of the binding residues led to decreased binding affinity for dimannose on domain B (Bolia et al., 2014b; Fujimoto and Green, 2012; Kelley et al., 2002; Matei et al., 2011; Ramadugu et al., 2014). Evolutionary analyses shows that the majority of the binding site residues are conserved in CV-N glycan interactions, suggesting that affinity is already optimized at the binding site (Koharudin et al., 2008; Percudani et al., 2005). We hypothesized that amino acid substitutions at distal positions could enhance the dimannose affinity of CV-N by rigidification of the binding site and applied our ICDC approach to CV-N to identify positions in each category (Supplementary file 2). We generated models of CV-N variants in each ICDC category by mutating these positions to amino acid types observed in the MSA of CV-N family members, choosing the subset of sequences that have binding sites with identical or similar amino acid composition to P51G-m4 CV-N. As discussed above, this approach allows us to identify amino acid substitutions with the least impact on fold. All the substitutions identified (104 variants in total) were modeled using the crystal structure of P51G-m4 CV-N (Fromme et al., 2008) and subjected to MD simulations (Abraham et al., 2015; Van Der Spoel et al., 2005). The best conformation sampled for each variant obtained from equilibrated production trajectories was used as a model for dimannose docking analysis. We evaluated the variants using Adaptive BP-Dock (Bolia and Ozkan, 2016), a computational docking tool that incorporates both ligand and receptor flexibility to accurately sample binding-induced conformations, and ranks them using X-scores binding energy units (XEUs) (Figure 4—figure supplement 1). In previous work on CV-N this method yielded good correlations with experimentally measured binding affinities (Kd), and established –6.0 XEU as a good threshold to differentiate variants that bind dimannose from 'non-binders' (Bolia et al., 2014b; Li et al., 2015; Woodrum et al., 2013). Here, we applied Adaptive BP-Dock initially on wild-type CV-N and its variants, P51G-m4 and mutDB (a mutant in which binding by domain B has been obliterated) and the results recapitulate the success of previous studies (Supplementary file 3). This result shows that Adaptive BP-Dock can correctly assess the dimannose binding of CV-N and its variants, thus, we applied it on new P51G-m4 CV-N variants to predict the impact of mutations on dimannose binding. Figure 2 shows the distribution of changes in predicted binding energy scores relative to the P51G-m4 energy scores for mutations belonging to each binary category: a positive change in binding score represents an unfavorable effect on binding, and, conversely, a negative change in the score indicates an enhancement in binding. Figure 2 Download asset Open asset Predicted binding energies for each integrated co-evolution and dynamic coupling (ICDC) category. Mutations in category (1,1) positions comprise the highest number of binding energy enhancing mutations as well as deleterious mutations. Mutations in category (0,0) positions are mostly near neutral (category (1,1) and (0,0) p value <0.3). The substitutions on positions in category (1,1) (Figure 2) yield a wide range of change in binding energy scores: the tail of the distribution on the positive side reaches nearly a binding score change of 2.0 XEUs and on the negative site values below –0.5 XEUs. Strikingly, the positions in category (1,1) yield the most binding enhancing energy scores compared to all other categories, mirroring TEM-1 results. Additionally, the substitutions applied in category (1,0) also result in more favorable binding energy scores for dimannose. Mutations in both category (1,1) and (1,0) present favorable binding energy scores. However, the number of mutations predicted to be enhancing binding in category (1,1) is more than those in category (1,0) (26% of category (1,1) compared to 14% of category (1,0)). Interestingly, the mutations in category (1,0) that disrupt the binding energy scores is not as strong as category (1,1), but similar to category (0,1) and (0,0). The observed mostly neutral behavior with category (0,0) agrees with the same trend obtained with TEM-1 analyses. Overall, the distribution of computational binding scores of dimannose binding to CV-N in each category aligns with the distribution of experimentally characterized TEM-1 fitness results of the same category. However, there are some discrepancies, for example, there are beneficial mutations in category (0,1) in TEM-1, but we don't observe the same trend in CV-N. This is due to the initial challenge faced in constructing the MSA of CV-N homologous proteins. There is limited sequence information, and most of the proteins in the CV-N family exhibits binding specificity to a different glycan (Fujimoto and Green, 2012; Koharudin et al., 2009). In contrast, β-lactamase family proteins exhibit highest activity toward penicillin, and they have been subjected to strong natural selection leading to conservation in both fold and function (Salverda et al., 2010; Zou et al., 2021). Hence, the less noise in evolutionary analysis in case of β-lactamase family of proteins allows us to correctly filter deleterious type of substitutions based on the MSA. Regardless, however, in both cases, as hypothesized, substitutions on category (1,1) residues impact the function most. To further investigate the mechanism of functional modulation of category (1,1) mutations, we chose the position with highest binding enhancing docking scores, I34, from category (1,1). I34 exhibits %DFI values lower than 0.2 (Figure 3A), is at least 16 Å away from binding residues (distal), dynamically coupled (Figure 3B) and co-evolved with the binding pocket (Supplementary file 2 and Supplementary file 6). Moreover, docking scores of I34 variants suggest that the mutations can modulate binding in a wide range: I34Y variant leads to an increase in binding affinity (beneficial), I34K decreases the binding affinity (deleterious), and I34L yields no change (neutral) (Table 1). Figure 3 Download asset Open asset DFI and DCI analyses on CV-N. (A) Dynamic flexibility index (DFI) profile mapped onto cyanovirin-N (CV-N) structure: red corresponds to high DFI (very flexibile sites), and blue to low DFI values (rigid sites). Position I34 (low DFI score) is highlighted. (B) Dynamic coupling index (DCI) profile projected on CV-N structure with green corresponding to sites exhibiting high coupling with binding site residues. Table 1 Predicted binding affinities of domain B, experimental ITC data, and chemical denaturation experiments for P51G-m4 and its I34 variants. ProteinPredictedbindingscore(X-score energy unit)ITC dimannoseKd (μM)ITC dimannoseΔH (kcal/mol)ITC dimannoseTΔS To the predictions of I34 variants, we first assessed the and thermal of these mutants by showed that all mutants are well and a fold similar to the characterized by with a single negative at We the of the mutants by thermal the thermal denaturation were analyzed to We found that the mutation I34L is as stable as P51G-m4, with of and In contrast, I34Y and I34K were less than P51G-m4 as shown by values of and surprisingly, a residue with a amino acid has a large while and is The trend of is I34 I34 (Figure 4—figure supplement 2). denaturation experiments were used to at each of by the for et al., The values and values of P51G-m4, and I34K are found as and and of and (Table 1). The results with the thermal denaturation P51G-m4 is the most stable to by and I34K (Figure 4—figure supplement 3). we evaluated the impact of the mutations on the dimannose binding affinity by (Figure 4—figure supplement data were analyzed to values in Table We found that I34Y binds dimannose with affinity µM) of all the mutants a twofold improvement over P51G-m4 by I34L is with a of binding was observed for I34K in these values from ITC experiments (Table suggesting that changes an important role in the observed changes in binding surprisingly, is positive for an increase in binding. To glean more information on the of binding by we the structure of the and form and compared it with the protein The fold is conserved (Figure as shown by of and Å with and and is well at position The binding pocket is also conserved compared to of the contacts between dimannose and P51G-m4 and I34Y (Figure shows an identical number of with the a conserved binding We compared the of I34Y acquired from Adaptive BP-Dock with the structure. The ligand shows an value of Å (Figure 4—figure supplement 5). suggest that the increase in binding affinity of I34Y toward dimannose might be by equilibrium dynamics, which are not by the crystal structure. This hypothesis is by the changes in measured experimentally in dimannose binding by P51G-m4 and I34Y Figure with all Download asset Open asset The of the crystal structures of P51G-m4 and (A) The crystal structures of I34Y in magenta and in and its protein P51G-m4 are (B) of structures of I34Y and P51G-m4 interactions with dimannose. Molecular mechanism governing the binding dynamics in I34 variants It is to observe that a distal site can modulate binding affinity to a wide range based on amino acid This finding has also been observed for allosterically enzymes such as for which different amino acid substitutions on sites lead to changes in function, a to modulate function through dynamics (Campitelli et al., 2021; Campitelli et al., et al., 2013; et al., 2017; et al., To on how the substitutions on I34 dynamically modulate the binding affinity, we MD simulations in both and Methods for of the The trajectories were analyzed for binding pocket and pocket to the dynamic the trajectories were to computational binding energies and 2009; et al., Previous computational work in our had binding affinity in the CV-N family to the of the binding A between the of and of a pocket, glycan whereas the of this leads to an pocket et al., 2015). the of this in the trajectories of and I34Y as for and conformations, we found that I34Y variant the binding pocket more often than P51G-m4 (Figure supplement 1). evidence I34 variants from P51G-m4 is the change in their binding pocket by pocket tool et al., 2017). The pocket for and P51G-m4 were into to distributions (Figure revealing that I34Y variant a more pocket compared to the pocket is small or dimannose its interaction with the and a conformation dimannose to the interactions with the This pocket sampled by I34Y also the different binding observed by in which a positive change binding compensates for the in compared to P51G-m4 (Table et al., 2013; Cornish-Bowden, analysis a value for I34K compared to P51G-m4, suggesting that this mutant the interactions with the dimannose in of binding. We applied the same pocket to the structures of P51G-m4 and I34Y variant, and we found of and for P51G-m4 a
- Research Article
103
- 10.1002/emmm.201000120
- Jan 26, 2011
- EMBO Molecular Medicine
Dysregulation of the antiviral immune response may contribute to autoimmune diseases. Here, we hypothesized that altered expression or function of MAVS, a key molecule downstream of the viral sensors RIG-I and MDA-5, may impair antiviral cell signalling and thereby influence the risk for systemic lupus erythematosus (SLE), the prototype autoimmune disease. We used molecular techniques to screen non-synonymous single nucleotide polymorphisms (SNPs) in the MAVS gene for functional significance in human cell lines and identified one critical loss-of-function variant (C79F, rs11905552). This SNP substantially reduced expression of type I interferon (IFN) and other proinflammatory mediators and was found almost exclusively in the African-American population. Importantly, in African-American SLE patients, the C79F allele was associated with low type I IFN production and absence of anti-RNA-binding protein autoantibodies. These serologic associations were not related to a distinct, functionally neutral, MAVS SNP Q198K. Hence, this is the first demonstration that an uncommon genetic variant in the MAVS gene has a functional impact upon the anti-viral IFN pathway in vivo in humans and is associated with a novel sub-phenotype in SLE. This study demonstrates the utility of functional data in selecting rare variants for genetic association studies, allowing for fewer comparisons requiring statistical correction and for alternate lines of evidence implicating the particular variant in disease.
- Research Article
51
- 10.1016/j.cels.2020.10.007
- Nov 18, 2020
- Cell systems
Inferring Protein Sequence-Function Relationships with Large-Scale Positive-Unlabeled Learning.
- Discussion
40
- 10.1038/jid.2014.269
- Dec 1, 2014
- Journal of Investigative Dermatology
Variant Analysis of CARD14 in a Chinese Han Population with Psoriasis Vulgaris and Generalized Pustular Psoriasis
- Research Article
- 10.1158/1557-3265.sabcs24-p1-05-10
- Jun 13, 2025
- Clinical Cancer Research
Current genetic screening for breast cancer predisposition is limited to the analysis of coding regions (exons) and intron/exon boundaries of BRCA1/2 genes. There is limited data on the prevalence and clinical significance of variants in the non-coding regions of these genes. Consequently, the majority of variants identified in these regions remain unclassified, and approximately 80% of germline BRCA1/2 tests are not considered in the daily management of patients with triple-negative breast cancer (TNBC). Emerging evidence suggests that non-coding variants can impact cancer risk and response to treatment. This study aimed to investigate the prevalence of variants in the non-coding regulatory regions of BRCA1/2 and other breast cancer predisposition genes in TNBC patients selected based on age at cancer diagnosis and/or family history of cancer. Additionally, we sought to explore the functional role of identified variants of uncertain significance (VUS) through ongoing analyses. We enrolled 144 TNBC patients who had previously tested negative for germline variants in the coding regions of BRCA1/2 and other cancer predisposition genes. Next-generation sequencing (NGS) analysis identified 635 rare variants in the non-coding regions of 28 selected genes involved in breast/ovarian cancer predisposition. In our TNBC cohort, we observed a higher prevalence of rare variants in the genes CDH1 (1.3%), STK11 (11.2%), ATM (10.7%), PTEN (7.40%), and PMS2 (5.04%). Germline variants in BRCA2 were statistically significantly associated with worse overall survival (p-value=0.017). CDH1 rare variants were associated with the highest percentage of non-pathologic complete response after neoadjuvant chemotherapy (p=0.0273). MLH1 and PALB2 rare variants were both associated with bilateral breast cancer (p=0.015 and p=0.0005, respectively). Rare variants of the ATM gene were associated with a positive family history (p=0.041). Preliminary single nucleotide variant (SNV) data analysis showed that the most significant functional score for alterations were detected in the promoter of MSH6, potentially associated with chromatin effects. Further analyses are ongoing to elucidate the functional impact of these variants. Due to the small sample size, these analyses should be considered exploratory, and larger studies are needed to confirm these findings and establish the clinical utility of screening for non-coding variants in TNBC patients. Citation Format: Michela Palleschi, Alessandra Virga, Emanuela Scarpi, Eugenio Fonzi, Filippo Merloni, Samanta Sarti, Rita Danesi, Mila Ravegnani, Chiara Casadei, Marianna Sirico, Caterina Gianni, Roberta Maltoni, Sara Bravaccini, Daniele Calistri, Valentina Arcangeli, Valentina Zampiga, Ilaria Cangini, Erika Bandini, Francesca Mannozzi, Fabio Falcini, Ugo De Giorgi, Paola Ulivi, Gianluca Tedaldi. Investigating the Potential Role of Rare Germline Non-Coding Variants in Cancer Predisposition Genes in Patients with Triple-Negative Breast [abstract]. In: Proceedings of the San Antonio Breast Cancer Symposium 2024; 2024 Dec 10-13; San Antonio, TX. Philadelphia (PA): AACR; Clin Cancer Res 2025;31(12 Suppl):Abstract nr P1-05-10.
- Peer Review Report
- 10.7554/elife.82593.sa2
- May 9, 2023
RaSP is a method for making rapid and accurate predictions of changes in protein stability that enabled us to calculate ~300 million stability changes for nearly all possible single amino acid changes in the human proteome.
- Research Article
452
- 10.1016/j.ajhg.2013.04.015
- May 16, 2013
- The American Journal of Human Genetics
Sequence Kernel Association Tests for the Combined Effect of Rare and Common Variants
- Research Article
- 10.33865/wjb.005.02.0305
- May 3, 2020
- World Journal of Biology and Biotechnology
Single nucleotide polymorphisms in GBBSI and SSIIa genes in relation to starch physicochemical properties in selected rice (Oryza sativa L.) varieties
- Research Article
78
- 10.1016/j.jacc.2014.01.031
- Feb 19, 2014
- Journal of the American College of Cardiology
Exome Sequencing Implicates an Increased Burden of Rare Potassium Channel Variants in the Risk of Drug-Induced Long QT Interval Syndrome
- Research Article
19
- 10.1160/th14-08-0679
- Jan 8, 2015
- Thrombosis and Haemostasis
SummaryPlatelet responses to activating agonists are influenced by common population variants within or near G protein-coupled receptor (GPCR) genes that affect receptor activity. However, the impact of rare GPCR gene variants is unknown. We describe the rare single nucleotide variants (SNVs) in the coding and splice regions of 18 GPCR genes in 7,595 exomes from the 1,000-genomes and Exome Sequencing Project databases and in 31 cases with inherited platelet function disorders (IPFDs). In the population databases, the GPCR gene target regions contained 740 SNVs (318 synonymous, 410 missense, 7 stop gain and 6 splice region) of which 70% had global minor allele frequency (MAF) < 0.05%. Functional annotation using six computational algorithms, experimental evidence and structural data identified 156/740 (21%) SNVs as potentially damaging to GPCR function, most commonly in regions encoding the transmembrane and C-terminal intracellular receptor domains. In 31 index cases with IPFDs (Gi-pathway defect n=15; secretion defect n=11; thromboxane pathway defect n=3 and complex defect n=2) there were 256 SNVs in the target regions of 15 stimulatory platelet GPCRs (34 unique; 12 with MAF<1% and 22 with MAF ≥ 1%). These included rare variants predicting R122H, P258T and V207A substitutions in the P2Y12 receptor that were annotated as potentially damaging, but only partially explained the platelet function defects in each case. Our data highlight that potentially damaging variants in platelet GPCR genes have low individual frequencies, but are collectively abundant in the population. Potentially damaging variants are also present in pedigrees with IPFDs and may contribute to complex laboratory phenotypes.
- Research Article
21
- 10.1371/journal.pone.0251289
- May 11, 2021
- PLOS ONE
Chiari Malformation Type 1 (CM-1) is characterized by herniation of the cerebellar tonsils below the foramen magnum and the presence of headaches and other neurologic symptoms. Cranial bone constriction is suspected to be the most common biologic mechanism leading to CM-1. However, other mechanisms may also contribute, particularly in the presence of connective tissue disorders (CTDs), such as Ehlers Danlos Syndrome (EDS). Accumulating data suggest CM-1 with connective tissue disorders (CTD+) may have a different patho-mechanism and different genetic risk factors than CM-1 without CTDs (CTD-). To identify CM-1 genetic risk variants, we performed whole exome sequencing on a single large, multiplex family from Spain and targeted sequencing on a cohort of 186 unrelated adult, Caucasian females with CM-1. Targeted sequencing captured the coding regions of 21 CM-1 and EDS candidate genes, including two genes identified in the Spanish family. Using gene burden analysis, we compared the frequency of rare, functional variants detected in CM-1 cases versus publically available ethnically-matched controls from gnomAD. A secondary analysis compared the presence of rare variants in these genes between CTD+ and CTD- CM-1 cases. In the Spanish family, rare variants co-segregated with CM-1 in COL6A5, ADGRB3 and DST. A variant in COL7A1 was present in affected and unaffected family members. In the targeted sequencing analysis, rare variants in six genes (COL7A1, COL5A2, COL6A5, COL1A2, VEGFB, FLT1) were significantly more frequent in CM-1 cases compared to public controls. In total, 47% of CM-1 cases presented with rare variants in at least one of the four significant collagen genes and 10% of cases harbored variants in multiple significant collagen genes. Moreover, 26% of CM-1 cases presented with rare variants in the COL6A5 gene. We also identified two genes (COL7A1, COL3A1) for which the burden of rare variants differed significantly between CTD+ and CTD- CM-1 cases. A higher percentage of CTD+ patients had variants in COL7A1 compared to CTD+ patients, while CTD+ patients had fewer rare variants in COL3A1 than did CTD- patients. In summary, rare variants in several collagen genes are particularly frequent in CM-1 cases and those in COL6A5 co-segregated with CM-1 in a Spanish multiplex family. COL6A5 has been previously associated with musculoskeletal phenotypes, but this is the first association with CM-1. Our findings underscore the contribution of rare genetic variants in collagen genes to CM-1, and suggest that CM-1 in the presence and absence of CTD symptoms is driven by different genes.