Abstract

HomeCirculation ResearchVol. 114, No. 2Causality in Genetics Free AccessLetterPDF/EPUBAboutView PDFView EPUBSections ToolsAdd to favoritesDownload citationsTrack citationsPermissions ShareShare onFacebookTwitterLinked InMendeleyReddit Jump toFree AccessLetterPDF/EPUBCausality in GeneticsThe Gradient of Genetic Effects and Back to Koch’s Postulates of Causality Ali J. Marian Ali J. MarianAli J. Marian Center for Cardiovascular Genetics, Institute of Molecular Medicine and Department of Medicine University of Texas Health Sciences Center at HoustonTexas Heart Institute Houston, TX Search for more papers by this author Originally published17 Jan 2014https://doi.org/10.1161/CIRCRESAHA.114.302904Circulation Research. 2014;114:e18–e21is corrected byCorrectionSee Research Commentary, p e2Genetics provide a robust scientific platform for establishing a relationship between a cause, for example, a genetic variant, and an effect, such as a disease phenotype. Yet, causality in genetics is a probabilistic and rarely a deterministic certainty. The causal relationship between a genetic variant and a phenotype is provisional to the conditions and the environment, such as the genetic backgrounds, in which the causal variants and the phenotype operate. The degree of probabilistic causality is, in part, determined by the effect sizes of the genetic variants, which typically follow a gradient ranging from minimal to large.1,2 Genetic variants (single nucleotide variants, small insertion/deletions, and structural variations) with large effect sizes are highly penetrant, as exemplified by variants that are responsible for the single gene disorders with Mendelian patterns of inheritance. In such circumstances, the cause-and-effect relationship is best analyzed through cosegregation and linkage analyses, whereby a logarithm of odds score of ≥3 is considered strong evidence of a genetic linkage but not necessarily a definitive indicator of causality. On the opposite end of the spectrum are the variants that exert modest effect sizes and, hence, are less penetrant. In such cases, establishing a cause-and-effect relationship is more challenging, because such low-penetrant variants typically do not show a clear cosegregation with the phenotype and are often found in the general population as well. Establishing the causal role of the variants is more challenging in small families, and even more so in a single individual. Further compounding the ascertainment of causality is the influence of the genetic backgrounds (modifier genetic variants) and the environmental factors, which are expected to exert greater influence when the effect sizes of the causal genetic variants are rather small. Hence, a genetic variant with a small or moderate effect size might be penetrant in certain genetic backgrounds but not in others. Notwithstanding the effect sizes of the causal genetic variants, the modifier variants and the environmental factors contribute to penetrance of the causal variants and phenotypic variability of the disease.In the background of these concepts, the author suggests categorization of genetic variants in the human genome, with regard to their pathogenic role in human diseases, into the following 5 groups2:Disease-Causing Variants. This category encompasses genetic variants that benefit from the most robust evidence of causality, typically achieved through genetic linkage analysis in large families. The variants generally exhibit high penetrance, exert large effect sizes, and are typically responsible for the single-gene disorders with Mendelian patterns of inheritance. Hence, when present in an individual’s genome, they commonly lead to the linked disease, albeit the severity of the phenotype is also influenced by modifier variants, genomic factors, and other determinants. Functional and mechanistic data lend further support to the causal role of these variants in the pathogenesis of the linked phenotype. The disease-causing variants are rare in the population and even rarer in an individual’s genome. Examples of well-established disease-causing variants include the nonsynonymous or frameshift mutations in MYH7 and MYBPC3 genes, which encode sarcomere proteins β-myosin heavy chain and myosin-binding protein C3, respectively, which are established causes of hereditary cardiomyopathies.3 Nevertheless, not all protein-altering variants in the genes known to cause single-gene disorders should be considered disease-causing variants. For example, MYH7 and MYBPC3, which are among the best-characterized genes for human hereditary cardiomyopathies,3,4 contain a large number of nonsynonymous variants that have not been linked to a clinical phenotype (http://evs.gs.washington.edu/EVS/). Not surprisingly, the population frequency of protein-altering and likely pathogenic variants in the sarcomere proteins are higher than the prevalence of hereditary cardiomyopathies in the general population.5,6 Finally, it is also important to recognize the shortcomings of in silico algorithms in the accurate identification of pathogenic variants, which are platform-dependent and have a modest agreement among multiple platforms.7 For example, only 1% of the nonsynonymous variants are consistently predicted to be functional when analyzed by multiple commonly used algorithms.7 Thus, prediction of a pathogenic role for a variant based on a single in silico platform has a high false-positive rate.7 Therefore, categorization of a genetic variant as a disease-causing variant must be based on strong human molecular genetic data, such as linkage evidence, typically in complement with biological and mechanistic studies.Likely Disease-Causing Variants. This category is defined as variants that show evidence of an association with the phenotype of interest along with strong mechanistic data that implicate them in the pathogenesis of the phenotype of interest. However, these variants, unlike those in the first category, do not benefit from robust human molecular genetic data, such as the linkage evidence in large families. The likely disease-causing variants impart the second largest effect sizes after the disease-causing variants. They often show incomplete penetrance, that is, they do not show a perfect cosegregation with the phenotype in the families. They are rare in an individual’s genome and might be found in the general population, albeit with a lower population frequency than in those with the phenotype. Accordingly, the likely disease-causing variants are enriched in those with the disease of interest.8,9 In addition to the human molecular genetic data, evidence for the causal role of these variants must be supported by the mechanistic data, such as induction of the intended phenotype in a model organism on introduction of the variant and reversal of the phenotype on its removal or shutting down expression of its protein. Therefore, human molecular genetic and mechanistic data are necessary to consider a variant as a likely disease-causing variant. Despite the genetic and mechanistic evidence, the causal role of this category of variants is less certain compared with the disease-causing variants.The 3 TRIM63 variants (p.A48V, p.I130M, and p.Q247*), recently identified in small families and index cases with hypertrophic cardiomyopathy (HCM), are considered likely disease-causing variants.9 The p.Q247* variant (rs148395034), which is a premature stop-codon mutation, was identified in 2 small families with HCM.9 Hence, evidence of genetic linkage could not be established because of the small size of the families. The p.Q247* is a loss-of-function variant and has a population frequency of 0.001 in whites and <0.0001 in blacks (http://evs.gs.washington.edu/EVS/ and http://browser.1000genomes.org). In accordance with the low population frequency of this variant in the general population, Ploski et al10 have identified the p.Q247* variant in a 22-year-old Polish professional soccer player who underwent genetic screening by whole-exome sequencing because of the prolonged QTc interval of 470 ms and an episode of 8-beat nonsustained ventricular tachycardia at a heart rate of 150 bpm. Neither the probands nor his 47-year-old mother had evidence of HCM.10 This finding, although not unanticipated based on the known population frequency of this variant, raises the question of the causality of this variant in HCM.One potential explanation for the absence of HCM in the carriers of p.Q247* variant is incomplete and age-dependent penetrance of this variant, as also noted by Ploski et al. The variant was originally identified in older individuals with HCM.9 In addition, phenotypic expression of the p.Q247* variant might be influenced by several other factors, including the genetic background of individuals. Despite detection of this variant in the general population, several lines of evidence support its pathogenic role in HCM, as described by Chen et al.9 Rare TRIM63 variants were enriched in the HCM population, and the p.Q247* variant exhibited a total loss of function (E3 ubiquitin ligase activity). When introduced into mice using an inducible system, it resulted in cardiac hypertrophy with preserved systolic function, a phenotype resembling HCM in humans.9 Moreover, shutting down the expression of the mutant protein led to reversal of the phenotype. The TRIM63 variants p.A48V and p.I130M (rs140523053 and rs377334933, respectively), also implicated in HCM, are also rare, and each has a population frequency of <0.001. They are also functional variants and are considered pathogenic in cell and animal models.9 Finally, one has to consider the pretest likelihood of the disease in the clinical interpretation of the genetic findings. The identification of rare variants in a disease population with a higher pretest likelihood has greater clinical implications compared with an incidental finding of the variant in the general population (a lower pretest likelihood of the disease). Nevertheless, despite the genetic and mechanistic evidence and enrichment of rare variants in the TRIM63 gene in the HCM population, TRIM63 variants are categorized as likely disease-causing variants.9Disease-Associated Variants. These are defined as variants that are associated with the disease but do not benefit from genetic or mechanistic data to implicate them in the pathogenesis of the disease. They might simply be in linkage disequilibrium with actual causal variants. These variants are typically common and exert small effect sizes. The majority of variants associated with the phenotype through genome-wide association studies of the complex traits are placed in this category (www.genome.gov/gwastudies). The causal role of disease-associated variants, identified through genome-wide association studies or otherwise, in the pathogenesis of the phenotype of interest is typically unsettled, although some might be functional or reside within genes involved in the pathogenesis of the phenotype. Many are expected to be simply DNA markers in linkage disequilibrium with actual pathogenic variants. To assess the causality of these variants, one has to consider the strength of the evidence of observed association with the phenotype, replication of the observed association in independent study populations, biological plausibility, and gene dose effects. Finally, experimentation is necessary to discern the potential causal role of this category of variants in the pathogenesis of the associated phenotype.Functional Variants With Unknown Clinical Consequences. These are defined as variants that exert biological effects, such as influencing the mRNAs and protein levels of their respective genes, but have not been associated with clinical phenotypes. Each human genome contains ≈13 500 nonsynonymous variants and a large number of regulatory variants, of which several thousand are predicted to be functional.11,12 The findings of the ENCODE (Encyclopedia of DNA Elements) project point to the presence of a large number of regulatory elements in introns and intergenic regions and, hence, putative functionality of the variants residing in these regions of the genome.12 A considerable number of variants located in the regulatory regions, through cis or trans mechanisms, affect the mRNA levels of their corresponding genes.13 It has also been estimated that each human genome contains ≈100 loss-of-function variants, including ≈20 homozygous loss-of-function variants that totally inactivate the genes in which they reside.14 However, the majority of putative or known functional variants are not associated with a specific clinical phenotype and clinically remain orphans. The prevalence of these variants in an individual’s genome and their population frequencies are expected to be high, but their effect sizes may be small.Variants With Unknown Biological or Clinical Significance. This category consists of the yet-to-be-characterized variants, which form ≈4 million sequence variants in each human genome. Their biological functions and whether they have any association with a phenotype remain unknown. Evidently, characterization of these variants could lead to their reclassification.The genetic diversity of humans, which has been, in part, due to the rapid expansion of the human population during the past 400 generations since the agricultural revolution, further complicates the identification of causal variants.15 Given the error rate of DNA replication and editing machinery set at 1×10−8 per nucleotide, each meiosis (genome duplication) introduces ≈30 de novo variants.11,16,17 Accordingly, the explosive population growth during the past 10 000 years led to the introduction of a large number of new and, by definition, rare variants into the population genome pool that have not been subjected to selective purification adequately. Therefore, it is not surprising that the majority of genetic variants in the human population are rare and typically population-specific.7,18 Yet, the rare variants comprise ≈95% of the putatively functional variants.7 Consequently, the private nature of each genome poses considerable challenges in the identification of a causal variant for a specific phenotype in a given individual. Likewise, a private or rare variant might cosegregate with the phenotype in a family by chance alone, albeit the chance of a random cosegregation inversely relates to the size of the family, because each informative meiosis reduces the chance of a random cosegregation by ≈50%. Thus, it is not surprisingly likely that pathogenic variants in genes previously associated with hereditary cardiomyopathies or cardiac arrhythmias are occasionally identified in the exomes of apparently normal individuals.10,19,20 This observation also extends to other Mendelian diseases, such as maturity-onset diabetes mellitus in the young, whereby the pathogenic variants, defined as rare, conserved, and protein-damaging, are also identified in healthy individuals in the general population.21 These discoveries point to the challenges encountered in the accurate identification of causal variants, simply based on population frequencies, segregation with the phenotype in small families, evolutionary conservation, effect on protein structure, or biological functions. Recent findings also highlight the potential contributions of the compound causal genetic variants to the pathogenesis of the phenotype, even in single-gene disorders.22–26 Collectively, these discoveries occasionally challenge the conventional cause-and-effect relationship and point to the complexity of genetic and nongenetic etiologic determinants of the clinical phenotype.2,21Given the plethora of rare variants in each genome, how do we determine the causal role of such variants in human diseases? The first, and important, step in establishing genetic causality is to provide robust genetic evidence, typically through linkage analysis in large families or through showing a statistically significant enrichment of rare variants in the candidate gene of interest in individuals with the phenotype (a gene-centric approach). In the absence of genetic evidence, in vitro functional studies alone, although supportive, are inadequate to discern functional variants from disease-causing or likely disease-causing variants. Model organisms can be valuable in supporting the causal role of candidate variants if the introduction of candidate variants into the model organism could result in a phenotype that resembles the intended human phenotype. However, the approach is not without shortcomings and alone is inconclusive evidence of causality.27 Finally, the resolution or reversal of the phenotype on turning off the candidate variant protein in an animal model offers additional evidence in support of causality. One might argue that these conditions are analogous to the Koch’s postulates of causality, which Robert Koch envisioned to establish the cause of infectious diseases more than a century ago, as they apply to genetic disorders.28 The analogous components of the Koch’s postulates in establishing causality in genetics might be considered as follows:Causal variants must be found and enriched in the families or subjects with the phenotype.The candidate causal variants must be functional and pathogenic (novel or rare, conserved, and protein-altering).The introduction of the variants into an experimental model should cause a phenotype that resembles the phenotype in humans.The removal (deletion or silencing) of the candidate causal variants should reverse the phenotype.The TRIM63 variants, described by Chen et al,9 met these criteria and therefore are considered likely disease-causing variants for HCM, as presented in the original article.Ali J. MarianCenter for Cardiovascular Genetics, Institute of Molecular Medicine and Department of Medicine University of Texas Health Sciences Center at HoustonTexas Heart Institute Houston, TXSources of FundingSupported in part by grants from National Heart, Lung, and Blood Institute (R01-HL088498 and R34HL-105563), National Institute on Aging (R21 AG038597-01), TexGen Fund from Greater Houston Community Foundation, and George and Mary Josephine Hamman Foundation.DisclosuresNone.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call