Abstract

HomeCirculation: Cardiovascular GeneticsVol. 8, No. 1Basic Concepts and Potential Applications of Genetics and Genomics for Cardiovascular and Stroke Clinicians Free AccessResearch ArticlePDF/EPUBAboutView PDFSections ToolsAdd to favoritesDownload citationsTrack citationsPermissions ShareShare onFacebookTwitterLinked InMendeleyReddit Jump toFree AccessResearch ArticlePDF/EPUBBasic Concepts and Potential Applications of Genetics and Genomics for Cardiovascular and Stroke CliniciansA Scientific Statement From the American Heart Association Kiran Musunuru, MD, PhD, MPH, FAHA, Chair, Kathleen T. Hickey, EdD, ANP, FAHA, Co-Chair, Sana M. Al-Khatib, MD, MHS, Christian Delles, MD, FAHA, Myriam Fornage, PhD, FAHA, Caroline S. Fox, MD, MPH, FAHA, Lorraine Frazier, PhD, RN, FAHA, Bruce D. Gelb, MD, David M. Herrington, MD, MHS, FAHA, David E. Lanfear, MD, MS, FAHA and Jonathan Rosand, MD, MSc, FAHAon behalf of the American Heart Association Council on Functional Genomics and Translational Biology, Council on Clinical Cardiology, Council on Cardiovascular Disease in the Young, Council on Cardiovascular and Stroke Nursing, Council on Epidemiology and Prevention, Council on Hypertension, Council on Lifestyle and Cardiometabolic Health, Council on Quality of Care and Outcomes Research, and Stroke Council Kiran MusunuruKiran Musunuru Search for more papers by this author , Kathleen T. HickeyKathleen T. Hickey Search for more papers by this author , Sana M. Al-KhatibSana M. Al-Khatib Search for more papers by this author , Christian DellesChristian Delles Search for more papers by this author , Myriam FornageMyriam Fornage Search for more papers by this author , Caroline S. FoxCaroline S. Fox Search for more papers by this author , Lorraine FrazierLorraine Frazier Search for more papers by this author , Bruce D. GelbBruce D. Gelb Search for more papers by this author , David M. HerringtonDavid M. Herrington Search for more papers by this author , David E. LanfearDavid E. Lanfear Search for more papers by this author and Jonathan RosandJonathan Rosand Search for more papers by this author and on behalf of the American Heart Association Council on Functional Genomics and Translational Biology, Council on Clinical Cardiology, Council on Cardiovascular Disease in the Young, Council on Cardiovascular and Stroke Nursing, Council on Epidemiology and Prevention, Council on Hypertension, Council on Lifestyle and Cardiometabolic Health, Council on Quality of Care and Outcomes Research, and Stroke Council Originally published5 Jan 2015https://doi.org/10.1161/HCG.0000000000000020Circulation: Cardiovascular Genetics. 2015;8:216–242is corrected byCorrectionOther version(s) of this articleYou are viewing the most recent version of this article. Previous versions: January 1, 2015: Previous Version 1 IntroductionAlthough genetics and genomics play an increasingly large role in the practice of medicine, the clinical care of patients suffering from cardiovascular disease or stroke has not been significantly affected. This is despite the tremendous strides being made to understand the genetic basis of both rare and common cardiovascular and stroke disorders through techniques such as genome-wide association studies (GWASs) and next-generation sequencing studies. Much of this knowledge remains to be translated to the clinic and must be subjected to clinical trials to ensure patient safety and a meaningful impact on clinical outcomes. However, even if this knowledge were to be successfully implemented into clinical practice, a potential barrier to widespread adoption is a lack of familiarity with basic concepts of genetics and genomics. Another concern is the possibility of the emergence of a significant gap in clinical care provided by practitioners who are informed about the clinical use of genetics and genomics knowledge and those who are not. Thus, there is a critical need to foster genetics/genomics literacy among all involved in the care of cardiovascular and stroke patients because it can be expected that these topics will transform the way medicine is practiced.The purpose of this document is to serve as a resource for practitioners in cardiovascular and stroke medicine on the application of genetics and genomics to patient care. Although not exhaustive, it contains an overview of the field written specifically to be accessible and relevant to practitioners. It also refers to additional educational materials available in the literature, in textbooks, and on the Internet. (Because this article is intended to be primarily educational in nature, rather than providing a review of the literature, citations are limited to a small number of research articles and reviews of exceptional interest.) It recommends a core knowledge base with which practitioners and especially trainees in cardiovascular and stroke clinical care should be familiar. Finally, it is intended to be a companion to the American Heart Association’s Council on Functional Genomics and Translational Biology Online Educational Series, in which online modules covering the topics outlined in this document are discussed in greater depth and are accessible to members of the cardiovascular and stroke clinical communities.Primer on Genetics and GenomicsBasics of Molecular BiologyDeoxyribonucleic acid (DNA) is a molecule with 2 strands that are wrapped around each other in a helical formation, hence its description as a double helix (Figure 1). The outer portion of the helix contains the sugar and phosphate backbone; the inner portion contains the coding bases: adenine (A), cytosine (C), guanine (G), and thymine (T). The genetic information of an organism is determined by the order of the sequence of the bases; with 4 bases available, the number of potential sequences is almost infinite. The versatility of DNA results from the obligatory pairing of bases in the 2 strands, forming base pairs. An adenine in 1 strand is always matched up with a thymine in the other strand, and cytosine is always paired with guanine. Thus, the 2 strands contain redundant information, and each can serve as a template on which a new complementary strand can be synthesized. This allows easy duplication of the DNA so that, when a cell divides into 2 cells, each descendant cell receives the same genetic information as the original cell.Download figureDownload PowerPointFigure 1. The structure of DNA. Each DNA strand has a sugar-phosphate backbone (not shown in detail) with a sequence of bases that come in 4 versions: adenine (A), cytosine (C), guanine (G), and thymine (T). Two DNA strands can combine to form a double helix, the stable form of DNA found in chromosomes. Holding the strands together are base pairs: Guanine on 1 strand binds to cytosine on the other strand, and adenine on 1 strand binds to thymine on the other strand. Thus, the 2 strands are complementary and contain redundant information.The DNA of an organism is organized into extremely long strands that are packaged by a large complex of supporting proteins into chromosomes. Humans have 23 pairs of chromosomes, including the pair that determines sex, which in women comprises 2 X chromosomes and in men 1 X and 1 Y chromosome (Figure 2). For each chromosome pair, 1 chromosome was inherited from the mother and 1 from the father. The full set of chromosomes is collectively called the genome. The human genome is largely contained within the nucleus of each cell, where it is separated from the rest of the cell functions. However, a small amount of DNA exists outside the nucleus in the mitochondria and is considered to be part of the human genome.Download figureDownload PowerPointFigure 2. The human genome. As seen in this karyotypic spread, the typical human cell has 46 chromosomes with 22 pairs of autosomes (numbered 1–22) and a pair of sex chromosomes, either XX or XY.In general, the genome is characterized by vast regions of noncoding DNA sequence punctuated by small areas of coding DNA, also called genes, that contain the instructions needed by cells to perform their functions. Coding DNA is transcribed into a single-stranded molecule called ribonucleic acid (RNA) by a collection of specialized enzymes (Figure 3). RNA is structurally similar to a DNA strand and contains 4 types of bases, including adenine, cytosine, and guanine (in RNA, uracil [U] is substituted for thymine [T] in DNA). The transcription enzymes have “proofreading” functions that ensure that the sequence of the RNA molecule faithfully matches the sequence of the DNA template from which it was synthesized. RNA is more flexible and mobile than DNA and is transported out of the nucleus of the cell into the outer compartment, the cytoplasm. Thus, RNA is the mechanism by which genetic information is expressed and relayed from the central repository (DNA) to the rest of the cell, where it directs cellular functions.Download figureDownload PowerPointFigure 3. The structure of a gene and its transcription into an mRNA. Genes have exons and introns. The exons are coding sequences (with the exception of 2 noncoding elements at the beginning and end of the gene called 5′ untranslated region [UTR] and 3′ UTR, the significance which is beyond the scope of this document). The introns are noncoding sequences. Transcription enzymes, starting at the noncoding promoter, produce an RNA transcript from the DNA gene. This RNA transcript is processed by splicing factors to remove the introns (and a “poly(A) tail” is added, the significance of which is beyond the scope of this document).Although some RNAs have specialized functions, for example, serving as structural components of certain parts of the cell, most RNAs take the form of messenger RNAs (mRNAs), which are translated by ribosomes into proteins (Figures 4 and 5). The ribosome reads from the beginning of the mRNA and uses it as a coding template to build proteins, with each nonoverlapping set of 3 consecutive bases (codons) serving to specify a particular amino acid. With 4 available bases, there are 64 possible codon combinations; with some redundancy, these codons are translated into any of 20 different amino acids or into a stop signal. The RNA sequence is converted into an amino acid sequence until a stop signal is reached that prompts the ribosome to finish and release the protein. The protein is then processed by the cell and deployed to its purpose (as an enzyme, secreted hormone, etc).Download figureDownload PowerPointFigure 4. Translation of an mRNA into a protein. After splicing, the coding sequence lies in 1 contiguous block, and a ribosome scans through the coding sequence to produce the protein. UTR indicates untranslated region.Download figureDownload PowerPointFigure 5. From RNA codons to amino acids. Groups of 3 RNA bases, or codons, are translated by the ribosome into amino acids. AUG (which encodes the amino acid methionine) is the standard start codon. As it scans the mRNA, the ribosome adds amino acids to an increasingly long chain. When it reaches a stop codon, the completed protein is released.This organized progression from DNA to transcribed RNA to translated protein is known as the central dogma of molecular biology (Figure 6), and although there are exceptions to this sequence of events, the central dogma explains the vast majority of cellular processes. In humans, these processes combine with environmental influences to determine each person’s individual characteristics, susceptibility to diseases, and responses to medications. New technology is now available to study the cellular processes at any step of the central dogma. When an investigation occurs at the level of DNA, it is called genetics if it deals primarily with 1 gene. The term genomics is used if it deals with the interactions among multiple genes or all of the genes in the genome. When at the level of mRNAs and proteins, the terms transcriptomics and proteomics, respectively, are used. Processed proteins or other products of enzymatic reactions are called metabolites, the study of which is called metabolomics. Together, the effects of DNA, RNA, proteins, and metabolites, when combined with environmental factors, result in phenotypes. Phenotypes can refer to phenomena occurring within a single cell or in an entire organism. Phenotypes include individual characteristics (eg, hair color), clinical traits (eg, blood cholesterol levels), or diseases (eg, myocardial infarction). In this document, we focus primarily on disease phenotypes.Download figureDownload PowerPointFigure 6. The central dogma. With a few exceptions, genetic information flows in 1 direction.Basic Characteristics of the GenomeThe human genome is roughly 6 billion DNA base pairs in size, spanning the 23 chromosome pairs, and represents virtually the entire list of coded instructions needed to create a human being. There are an estimated 20 000 genes in the human genome, most of which encode proteins or components of proteins. What makes each person unique is a large number of DNA variants distributed throughout the genome. Some people have particular DNA variants that can predispose them to cardiovascular disease or stroke. These variants often require the presence of environmental factors (eg, smoking and obesity) to trigger disease. Less commonly, certain variants have such a strong effect that they can cause disease outright. Other variants may determine how well or poorly patients respond to particular medications.One reason that some people are more susceptible to getting a disease than other people or respond differently to medications is that their DNA variants affect the function of genes. There are rare variants that have a large effect on the function of a gene by either significantly increasing or decreasing the activity of the gene; these are the kind of variants that cause disease in many members of a single family and are known as mutations. Classic examples include hypertrophic cardiomyopathy and Marfan syndrome. There are common variants (>1% of the general population) that have a small effect on the function of a gene. These variants do not change gene activity enough to cause disease by themselves but instead need to be combined with other variants in other genes or with environmental factors for disease to occur. This is the case with most cardiovascular disorders for which there are many contributing factors, for example, hypercholesterolemia, myocardial infarction, and ischemic stroke.All of these differences at the DNA level are called polymorphisms, of which there are several types (Figure 7). Single-nucleotide polymorphisms (SNPs) occur when a single base in the DNA differs from the usual base at that position. Variable-number tandem repeats are polymorphisms in which the number of repeats of a short DNA sequence at a location varies from person to person; when the length of the repeat ranges from 2 to 6 base pairs, other names for this type of polymorphism include microsatellites, single-sequence repeats, and short tandem repeats. A copy number variation (CNV) is a polymorphism in which the number of repeats of a large DNA sequence (>1000 base pairs) at a location varies from person to person, with the number typically ranging from zero copies (deletion of the sequence) up to a few copies. An indel (short for insertion-deletion) is a polymorphism in which a DNA sequence of any size is either present or absent at a location, varying from person to person. An indel can be characterized as either a variable-number tandem repeat or a CNV, depending on the size of the involved sequence.Download figureDownload PowerPointFigure 7. Three types of polymorphisms. The top left DNA molecule is the reference sequence; each of the other 3 DNA molecules demonstrates 1 type of DNA variant. “Indel” indicates insertion-deletion; SNP, single-nucleotide polymorphism.SNPs are the most common and best characterized of the polymorphisms, with tens of millions SNPs now identified across the human genome (they are cataloged in a database called dbSNP, http://www.ncbi.nlm.nih.gov/SNP/). On average, they occur every few hundred base pairs. SNPs are a large contributor to the genomic variation that distinguishes each individual person. Much of genomics research has focused on understanding how SNPs are distributed in different populations, how they affect gene function, and how they contribute to disease. Most GWASs (see below) have largely focused on discovering associations between SNPs (rather than variable-number tandem repeats or CNVs) and diseases.Coding and Noncoding DNA VariantsAs mentioned, the genome can be divided into coding and noncoding DNA. Coding DNA, which makes up just 1% of the genome, contains the gene sequences that are transcribed into mRNAs and then translated into proteins. The coding DNA of a single gene is usually not present as a single continuous block but rather is split into a number of distinct blocks called exons that are separated by stretches of noncoding DNA called introns (Figure 3). When a gene is transcribed, it begins with a change in the balance of regulatory proteins called transcription factors that are associated with an upstream region of noncoding DNA called the promoter. Specific transcription factors can either enhance or repress this process, so that transcription is initiated when the balance of transcriptional enhancers outweighs translational repressors. Immediately downstream of the promoter is the first exon of the gene, followed by an intron, followed by the next exon, followed by another intron, etc. The entire region of DNA (including both the exons and introns of the gene but not the promoter) is transcribed into RNA.After the full RNA is transcribed, it is processed in the nucleus with the help of proteins called splicing factors. Introns are excised and the ends of the exons are joined, thereby creating an mRNA with all of the exons now forming a continuous sequence (Figure 3). In some cases, alternative splicing occurs. Depending on circumstances, a particular exon may be either included or excluded from the final mRNA, or a choice may be made between 2 adjacent exons, resulting in either 1 or the other exon being included in the final mRNA. Alternative splicing can thereby result in the creation of a heterogeneous pool of mRNAs transcribed from a single gene, resulting in a heterogeneous mix of slightly different proteins, called isoforms. In different situations, the pool of mRNAs from a gene may be dominated by some splice forms versus other splice forms, allowing an extra level of regulation of gene function.When SNPs fall in the midst of coding DNA, a variety of consequences for gene function can occur, despite a change of just 1 DNA base. Nonsynonymous variants are SNPs that alter a codon in a way that changes the amino acid that is encoded by the codon. One type of nonsynonymous variant, called a missense variant, results in a single amino acid being changed in the protein product that is translated from the gene. This is because of the codon being switched from 1 type of amino acid to another. For example, a change in a codon from AAG to AAC would result in the substitution of the amino acid asparagine for the amino acid lysine in the protein. Another type of nonsynonymous variant, called a nonsense variant, results in the protein being prematurely truncated at that position as a result of the codon being changed to a stop signal. An example is a change in a codon from AAG, which encodes the amino acid lysine, to UAG, a stop codon. Typically, although not always, a nonsense variant will have greater consequences for gene function than a missense variant. Many SNPs are synonymous variants, which change a DNA base without changing the amino acid specified by the codon. This can occur as a result of the redundancy of the genetic code. Because there are 64 possible codons that encode only 20 different amino acids, most of the amino acids are encoded by multiple codons that are very similar; for example, they may vary only in the third base of the codon. For example, the amino acid lysine is encoded by the codons AAA and AAG; the amino acid glycine is encoded by the codons GGA, GGC, GGG, and GGT. Thus, a single base change may not ultimately affect the protein.Although most synonymous variants are not thought to affect gene function in any way, there can be exceptions. For example, if the variant occurs at the very beginning or end of any exon, it can potentially interfere with splicing of that exon and the adjacent intron. Splice-site variants can affect alternative splicing of exons or, in some scenarios, can cause introns to be inappropriately included in mRNAs, with deleterious consequences for the translated protein products.Small indels that cause the insertion or deletions of a few base pairs of coding DNA can result in the disruption of gene function. Frameshift variants can result in the frame of an mRNA being placed out of register so that the ribosome is no longer reading the appropriate codons. For example, because codons are read as groups of 3 bases, deletion (or insertion) of 1 base would result in each of the subsequent codons being misread by the ribosome. The same would occur with deletion (or insertion) of 2 bases. This usually, but not always, results in a premature stop signal occurring soon after the site of the variant, causing a dysfunctional truncated protein to be made. (Thus, functionally, frameshift variants and nonsense variants are similar.) The deletion (or insertion) of multiples of 3 bases would have different effects and would therefore not constitute frameshift variants. In this case, ≥1 amino acids would be missing from (or extra amino acids would be present in) the final protein, but because the subsequent codons would still be in the correct frame, the remainder of the protein would be normally translated and therefore would be intact. The missing (or extra) amino acids may or may not affect the activity of the protein, depending on where they fall in the protein.Noncoding DNA variants occur either within a gene (promoter, introns) or outside a gene. Although noncoding DNA variants do not affect codons, they can nevertheless affect the final protein products of genes. Variants within introns can affect the splicing of nearby exons, thereby affecting which protein isoforms are produced. Variants within promoters can directly affect gene transcription, resulting in higher or lower levels of mRNAs being produced, which in turn results in higher or lower levels of protein being produced. Even when far away from genes, variants can affect their transcription. Noncoding DNA elements called transcriptional enhancers and transcriptional repressors can affect the expression of genes from large distances, as many as thousands of bases (kilobases) or even millions of bases (megabases), through 3-dimensional interactions between different regions of a chromosome, that is, folding of a chromosome resulting in 2 remote sites being brought into proximity, with transcription factors bridging between the 2 sites. Variants in these transcriptional elements can thereby modulate gene expression.Finally, various classes of RNAs exist that are transcribed from noncoding DNA and therefore do not code for proteins but can nevertheless affect the functions of other genes. MicroRNAs (miRNAs) are small noncoding RNAs ≈22 nucleotides in size that match complementary sequences within mRNA molecules. By forming base pairs with an mRNA sequence, an miRNA can regulate the amount of protein produced by the mRNA. This can occur by blocking of translation of the mRNA, which directly reduces the yield of protein, or by inducing the degradation of the mRNA, which indirectly reduces the yield of protein. In some cases, an miRNA may enhance transcription of a gene or translation of an mRNA, thereby increasing the level of the protein product. Many miRNAs are contained in the introns of coding genes, with the others lying in regions between genes.Long noncoding RNAs (lncRNAs) are transcripts longer than 200 nucleotides. They can play a number of different roles in regulating gene expression and protein production. Some lncRNAs can bind to and modulate the activity of specific transcription factors, thereby affecting the transcription of certain genes. Other lncRNAs regulate the basic enzymes involved in the transcription of all genes, thereby causing global changes in the cell, or act to silence genes in large portions of or even entire chromosomes. Yet other lncRNAs are involved in the regulation of translation of mRNAs, often via the formation of base pairs with a complementary sequence in an mRNA, similar to the mechanism by which miRNAs act, or in the regulation of mRNA splicing. Thus, for both miRNAs and lncRNAs, noncoding DNA variants that fall within the sequences encoding these RNAs can potentially have important functional consequences.Genotyping and Sequencing to Determine the Identity of DNA VariantsIn most cases, each person has 2 copies of each DNA sequence, called alleles, because of the pairing of chromosomes; the exceptions are DNA sequences on the X or Y chromosome in men, who have only 1 of each chromosome. A person’s genotype at the site of a polymorphism is the identity of the DNA sequence for each of the 2 alleles on the paired chromosomes. For an SNP, a genotype is typically designated as 2 letters corresponding to the identities of the bases at the SNP position (eg, AA versus AG versus GG). For a variable-number tandem repeat or CNV, a genotype is typically designated as 2 numbers corresponding to the copy numbers of the 2 alleles. A haplotype is a combination of SNPs at multiple locations on a chromosome, often within kilobases of each other, that are usually transmitted as a group from parents to offspring.There are 2 methods to determine the genotypes of a polymorphism. First, there are assays that allow the direct genotyping of a polymorphism. Although a description of the technical details of these assays is beyond the scope of this document, the assays have the advantages of being relatively inexpensive (compared with sequencing, as described below) and can be combined into a high-throughput format, usually in a genotyping array or chip format, that can ascertain the genotypes of up to millions of polymorphisms in a person’s genomic DNA sample in a single experiment. This is the technique used by commercial DNA testing services. Such services extract genomic DNA from the cells in a person’s saliva sample and then apply the DNA to a genotyping chip to determine the genotypes of a large number of SNPs and CNVs distributed across the genome. One disadvantage of this methodology is that it can ascertain the genotypes of only predetermined polymorphisms. It cannot interrogate any other DNA bases in the genome and, importantly, cannot discover new polymorphisms.The second method to determine the genotypes of polymorphisms entails DNA sequencing. DNA sequencing techniques date back to the 1970s, when it could take days to determine the identity of the bases in a sequence of a few dozen DNA bases. In the 1990s, improved DNA sequencing techniques were developed that allowed an international consortium to sequence the entirety of the human genome, the Human Genome Project, in ≈12 years at a cost of US $3 billion.1 The 2000s saw the invention of next-generation sequencing techniques, which enormously decreased the time and costs required to sequence increasingly large stretches of DNA. In 2009, the first reports of whole-exome sequencing of DNA samples from patients were published.2,3 The exome comprises the entirety of the coding portions of the genome, that is, all of the exons of the ≈20 000 genes, which together constitute ≈1% of the genome. Shortly thereafter, whole-genome sequencing of DNA samples from patients was reported.4 As expected, whole-exome sequencing remains cheaper than whole-genome sequencing, but further advances in next-generation sequencing technology have made it possible to sequence a patient’s genome in a single day for a few thousand US dollars.Because known polymorphisms affect only a small proportion of the DNA bases in the genome, it remains more expensive to sequence the entire genome than to genotype polymorphisms; thus, direct genotyping assays remain in common use. However, a significant advantage of whole-exome and whole-genome sequencing is the ability to discover new DNA variants, especially rare DNA variants that are unique to particular individuals or families. As sequencing technologies become even cheaper, it can be expected that whole-genome sequencing will eventually supplant direct genotyping.Monogenic Cardiovascular and Stroke DisordersRare DNA Variants and Monogenic (Mendelian) DisordersClassic genetics focused largely on monogenic, or mendelian, diseases, that is, those that follow the Mendel laws of inheritance. In these diseases, a DNA variant or variants in a single gene are responsible for causing disease. Perforce, these variants must have large effects on gene function because they are able to singlehandedly induce disease. Typically, these variants are quite rare in a given population because they are unique to a patient or a family and thus are called mutations. The reason for the rarity of these mutations is natural selection: If the mutations result in disorders that decrease health and reproductive fitness, they will eventually be eliminated from a population. In exceptional cases, mutations may cause both beneficial and detrimental consequences, resulting in opposing forces of positive selection and negative selection that may cause the mutations to be preserved at nonrare frequencies in a population. For example, the HbS mutation in the HBB gene (which produces the β subunit of hemoglobin) causes

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call