Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Application of SNP technologies in medicine: lessons learned and future challenges.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Over the past few years, single nucleotide polymorphisms (SNPs) have been proposed as the next generation of markers for the identification of loci associated with complex diseases and for pharmacogenetic applications (Lander and Schork 1994; Lander 1996; Risch and Merikangas 1996; Kruglyak 1997; Schafer and Hawkins 1998). SNPs are frequently present in the genome with a density of at least one common (>20% allele frequency) SNP per kilobase pair (Lai et al. 1998; Sachidanandam et al. 2001). They are mostly biallelic ( 1.6 million SNPs in the public databases (Sachidanandam et al. 2001). In this article, I will attempt to summarize what we know about SNPs and identify some of the challenges that await us in the application of SNPs in research and medicine. The first questions most people would ask are, how many SNPs are there in the human genome and have we identified most of the SNPs? The frequently cited rate of 1 SNP/kb suggests that there are 3 million common SNPs in the human genome. However, recent data have indicated that the number of SNPs in the human genome is potentially much more than 3 million. The first indication came from the comparison of the Celera SNP database with the public data. Celera Genomics claimed to contain over 3.5 million putative SNPs in their database. However, only 400,000 of their SNPs were redundant when compared to the publicly available 1.6 million. The second line of evidence came from our own experiments. We have isolated >1000 SNPs in a 20megabase region by re-sequencing eight individuals (not the same DNA source as the TSC SNPs). The overlap between our SNPs (∼1,000) and the TSC SNPs in this region is ∼5% (instead of the expected 50% if the total number of common SNP is around 3 million). These results suggest that there are potentially 10 million or more common SNPs in the human population. A theoretical modeling experiment has also predicted that there are more than 10 million SNPs in the genome (Kruglyak and Nickerson 2001). There are two important implications in the usage of SNPs as a genetic tool if there are indeed over 10 million SNPs in the human genome. The first implication is that the SNP(s) you are looking for might not be discovered yet. The second implication is the need to select a representative set of SNPs out of the 1.6 million to cover the genome. The first problem is a difficult one since it is impossible to know whether the SNP(s) of interest is present in the current databases. There are two potential solutions. The first solution is to design experiments that combine SNP discovery and genotyping (Brenner et al. 2000). However, this approach has not been demonstrated for whole genome SNP scan and could be costly even if it is technically feasible. The second solution, which is suitable for both implications mentioned above, is the development of a comprehensive whole genome SNP marker set that has a high likelihood of detecting the SNP(s) of interest by linkage disequilibrium or association (see section below on marker set development) (Jorde 2000). So how do we design a marker set that covers the genome as completely as possible? There are many suggestions and computer models using linkage disequilibrium (LD) as a guide and striking a balance between number of markers and information content (Kruglyak 1999; Jorde 2000). A number of recent studies have indicated that an average spacing of 30 kb provides a good balance (i.e., 100,000 SNPs for whole genome) (Collins 1999; Huttley et al. 1999; Goddard et al. 2000; Jorde 2000). In addiE-MAIL ehl21107@GlaxoWellcome.com; FAX (919) 315-0113. Article and publication are at www.genome.org/cgi/ doi/10.1101/gr.192301. Insight/Outlook

Similar Papers
  • Research Article
  • Cite Count Icon 58
  • 10.1158/1055-9965.681.13.5
SNPs, Haplotypes, and Cancer: Applications in Molecular Epidemiology
  • May 1, 2004
  • Cancer Epidemiology, Biomarkers & Prevention
  • Timothy R Rebbeck + 6 more

SNPs, Haplotypes, and Cancer: Applications in Molecular Epidemiology

  • Research Article
  • 10.1158/1538-7445.am2017-1284
Abstract 1284: Identification of potential cancer regulatory germline single-nucleotide polymorphisms in the non-coding genome
  • Jul 1, 2017
  • Cancer Research
  • Diptee A Kulkarni + 2 more

Tumorigenesis in sporadic cancers is mainly driven by somatic genetic alterations such as driver mutations in protein coding genes or chromosomal changes comprising deletions, amplifications or translocations resulting in loss of tumor suppressor proteins, gain of oncogenic proteins or expression of aberrant fusion proteins, respectively. Some cancers lack such somatic changes, but are addicted to expression of certain genes for their sustained proliferation and survival. There is evidence of such oncogenic addiction to LIM domain only 1 (LMO1) expression in neuroblastoma (NB). Genome-wide association studies (GWAS) have identified robust associations between germline single-nucleotide polymorphisms (SNPs) within LMO1 and NB susceptibility with the causal SNP being rs2168101. Investigation of the mechanism of NB dependency on LMO1 showed that LMO1 expression in NB cells is regulated by rs2168101, which resides within a highly conserved tissue-specific super enhancer in LMO1 intron 1 and drives LMO1 expression through GATA3 transcription factor binding. This makes LMO1-dependent NB a unique example of sporadic cancer driven by germline genetic variation. Numerous GWAS have identified significant associations between germline SNPs in the non-coding genome and cancer risk or outcomes. To identify additional examples of regulatory SNPs as cancer drivers, we merged published genome-wide significant associations from cancer GWAS with genome regulatory data from ENCODE (Encyclopedia of DNA Elements; Nature. 2012 Sep 6; 489 (7414): 57-74)) and searched for clusters of cancer associated SNPs that resided within gene regulatory elements. Gene regulatory elements were defined as those marked by active epigenetic features and chromatin accessibility in cancer cell lines. Of the ~1,600 unique, genome-wide significant SNPs from cancer GWAS with regulatory evidence, we identified 46 clusters of 3 or more putative regulatory SNPs near 28 genes. These clusters were particularly enriched within ovarian cancer associated loci. Mechanistic studies such as reporter assays and genome editing in relevant cell types are being considered to identify the causal SNPs from these clusters regulating gene expression and driving tumorigenesis, which in turn may lead us to new cancer targets. Citation Format: Diptee A. Kulkarni, Kijoung Song, Karl Guo. Identification of potential cancer regulatory germline single-nucleotide polymorphisms in the non-coding genome [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 1284. doi:10.1158/1538-7445.AM2017-1284

  • Front Matter
  • Cite Count Icon 12
  • 10.1016/j.jaci.2009.12.976
Genetics and biology of asthma 2010: La' ci darem la mano…
  • Feb 1, 2010
  • Journal of Allergy and Clinical Immunology
  • Donata Vercelli

Genetics and biology of asthma 2010: La' ci darem la mano…

  • Research Article
  • Cite Count Icon 53
  • 10.1038/sj.ejhg.5201987
Singleton SNPs in the human genome and implications for genome-wide association studies
  • Jan 16, 2008
  • European Journal of Human Genetics
  • Xiayi Ke + 2 more

The human genome is estimated to contain one single nucleotide polymorphism (SNP) every 300 base pairs. The presence of LD between SNP markers can be used to save genotyping cost via appropriate SNP tagging strategies, whereas absence or low level of LD between markers generally increase genotyping cost. It is quite common that a large proportion of tagging SNPs in a tagging scheme often turn out to be singleton SNPs, that is, SNPs that only tag themselves rather than contribute power to the rest of a region. If genotyping cost is a major concern, which often is the case at the present time for genome-wide association studies, these singleton tagging SNPs would be the primary targets to be removed from genotyping. It is important, however, to understand the characteristics of such SNPs and estimate the impact of removing them in a study. Using the HapMap genotype data and genome wide expression data, we assessed the distribution and functional implications of singleton SNPs in the human genome. Our results demonstrated that SNPs of potentially higher functional importance (eg, nonsynonymous SNPs, SNPs in splicing sites and SNPs in 5' and 3' UTR) are associated with a higher tendency to be singleton SNPs than SNPs in intronic and intergenic regions. We further assessed whether singleton SNPs can be tagged using haplotypes of tagSNPs in the three genome wide chips, that is, GeneChip 500k of Affymetrix, HumanHap300 and HumanHap550 of Illumina, and discussed the general implications on genetic association studies.

  • Research Article
  • Cite Count Icon 24
  • 10.1016/j.omtn.2020.01.012
Allele-Selective Knockdown of MYH7 Using Antisense Oligonucleotides
  • Jan 21, 2020
  • Molecular Therapy - Nucleic Acids
  • Brian R Anderson + 20 more

Allele-Selective Knockdown of MYH7 Using Antisense Oligonucleotides

  • Addendum
  • Cite Count Icon 10
  • 10.1007/s11033-014-3799-9
Erratum to: Single nucleotide polymorphism analysis of Korean native chickens using next generation sequencing data.
  • Jan 15, 2015
  • Molecular Biology Reports
  • Dong-Won Seo + 11 more

There are five native chicken lines in Korea, which are mainly classified by plumage colors (black, white, red, yellow, gray). These five lines are very important genetic resources in the Korean poultry industry. Based on a next generation sequencing technology, whole genome sequence and reference assemblies were performed using Gallus_gallus_4.0 (NCBI) with whole genome sequences from these lines to identify common and novel single nucleotide polymorphisms (SNPs). We obtained 36,660,731,136 ± 1,257,159,120 bp of raw sequence and average 26.6-fold of 25-29 billion reference assembly sequences representing 97.288 % coverage. Also, 4,006,068 ± 97,534 SNPs were observed from 29 autosomes and the Z chromosome and, of these, 752,309 SNPs are the common SNPs across lines. Among the identified SNPs, the number of novel- and known-location assigned SNPs was 1,047,951 ± 14,956 and 2,948,648 ± 81,414, respectively. The number of unassigned known SNPs was 1,181 ± 150 and unassigned novel SNPs was 8,238 ± 1,019. Synonymous SNPs, non-synonymous SNPs, and SNPs having character changes were 26,266 ± 1,456, 11,467 ± 604, 8,180 ± 458, respectively. Overall, 443,048 ± 26,389 SNPs in each bird were identified by comparing with dbSNP in NCBI. The presently obtained genome sequence and SNP information in Korean native chickens have wide applications for further genome studies such as genetic diversity studies to detect causative mutations for economic and disease related traits.

  • Research Article
  • Cite Count Icon 10
  • 10.1007/s11033-014-3790-5
Single nucleotide polymorphism analysis of Korean native chickens using next generation sequencing data.
  • Oct 11, 2014
  • Molecular biology reports
  • Dong-Won Seo + 11 more

There are five native chicken lines in Korea, which are mainly classified by plumage colors (black, white, red, yellow, gray). These five lines are very important genetic resources in the Korean poultry industry. Based on a next generation sequencing technology, whole genome sequence and reference assemblies were performed using Gallus_gallus_4.0 (NCBI) with whole genome sequences from these lines to identify common and novel single nucleotide polymorphisms (SNPs). We obtained 36,660,731,136 ± 1,257,159,120 bp of raw sequence and average 26.6-fold of 25-29 billion reference assembly sequences representing 97.288 % coverage. Also, 4,006,068 ± 97,534 SNPs were observed from 29 autosomes and the Z chromosome and, of these, 752,309 SNPs are the common SNPs across lines. Among the identified SNPs, the number of novel- and known-location assigned SNPs was 1,047,951 ± 14,956 and 2,948,648 ± 81,414, respectively. The number of unassigned known SNPs was 1,181 ± 150 and unassigned novel SNPs was 8,238 ± 1,019. Synonymous SNPs, non-synonymous SNPs, and SNPs having character changes were 26,266 ± 1,456, 11,467 ± 604, 8,180 ± 458, respectively. Overall, 443,048 ± 26,389 SNPs in each bird were identified by comparing with dbSNP in NCBI. The presently obtained genome sequence and SNP information in Korean native chickens have wide applications for further genome studies such as genetic diversity studies to detect causative mutations for economic and disease related traits.

  • Research Article
  • Cite Count Icon 13
  • 10.1016/j.ijfoodmicro.2017.10.019
Application of whole genome sequence data in analyzing the molecular epidemiology of Shiga toxin-producing Escherichia coli O157:H7/H-
  • Oct 17, 2017
  • International Journal of Food Microbiology
  • Eiji Yokoyama + 3 more

Application of whole genome sequence data in analyzing the molecular epidemiology of Shiga toxin-producing Escherichia coli O157:H7/H-

  • Research Article
  • Cite Count Icon 57
  • 10.1161/circgenetics.108.816751
Genome-Wide Association Studies for Atherosclerotic Vascular Disease and Its Risk Factors
  • Feb 1, 2009
  • Circulation: Cardiovascular Genetics
  • Keyue Ding + 1 more

Received August 26, 2008; accepted December 10, 2008. Atherosclerotic vascular disease is a major health care burden, being the leading cause of morbidity and death worldwide.1 A better understanding of the genetic basis of atherosclerotic vascular disease is urgently needed to provide new insights into the underlying pathophysiological mechanisms and facilitate development of novel diagnostic and therapeutic modalities. The advent of genome-wide association (GWA) studies (see supplementary Table 1 for glossary) is an important step in this direction, having led to the identification of susceptibility alleles for many of the common “complex” diseases. This is in contrast to genetic linkage studies, which had limited success in identifying genes for complex diseases or quantitative trait loci and candidate gene-based association studies, the results of which have been mostly irreproducible. Editorial see p 1 GWA studies became possible with the completion of the Human Genome Project,2 the discovery of millions of single-nucleotide polymorphisms (SNPs) in the human genome, and the International HapMap Project3 that characterized the patterns of linkage disequilibrium (LD) in the human genome, as well as the availability of high-throughput genotyping platforms and decreased costs of genotyping. In contrast to candidate gene studies in which genes are selected on the basis of known or suspected disease mechanisms, GWA studies permit a relatively comprehensive scan of the genome in an agnostic fashion, and thus have the potential to identify novel disease susceptibility or quantitative trait loci. Although there are at least 7 million common SNPs (minor allele frequency >5%) in the human genome,4 neighboring SNPs are often strongly correlated with each other (ie, in LD). LD is measured by the r 2 statistic, which indicates the correlation of alleles at 2 sites, and ranges from 0 (no correlation) to 1 (perfect correlation). GWA studies take advantage of …

  • Research Article
  • 10.3760/cma.j.issn.0254-6450.2011.11.024
Comparison of minor allele frequency and haplotype frequencies for single nucleotide polymorphisms in receptor tyrosine kinase-like orphan receptor 2 gene using HapMap data from Han Chinese in Beijing (CHB) and Japanese in Tokyo (JPT)
  • Nov 1, 2011
  • Chinese journal of epidemiology
  • Hong Wang + 1 more

Single nucleotide polymorphisms (SNPs) in receptor tyrosine kinase-like orphan receptor 2 (ROR2) gene were analyzed and compared between Han Chinese in Beijing (CHB) and Japanese in Tokyo (JPT) using the HapMap data, to provide basis for SNP determination. ROR2 gene related etiologic studies were conducted in the above mentioned two populations. Monotonic and un-monotonic SNPs of ROR2 gene were distinguished by Haploview program. Minor allele frequency (MAF), haplotype blocks and haplotype frequencies were analyzed in eligible SNPs and tag SNPs respectively with genotyping call rate > 80%, MAF > 1%, H-W equilibrium (P > 0.01) and no gender difference (P > 0.05). Tag SNPs were determined under the criteria of r(2) ≥ 0.8 or logarithm of the odd score (LOD) ≥ 3 for pairwise eligible SNPs in CHB and JPT. Common tag SNPs for CHB and JPT were directly reported by Haploview program or being identified from those which were higly related to tag SNPs reported by haploview program under SPSS 13.0 software. A total of 404 common SNPs were provided for both CHB and JPT samples by HapMap, where 101 common monotonic SNPs between CHB and JPT had the common minor alleles. The common SNPs between CHB and JPT were 257. In the 257 common eligible SNPs, 224 (87.2%) had common minor alleles. Among the 18 and 27 haplotype blocks identified in 257 common eligible SNPs between CHB and JPT, except for 2 independent haplotype blocks identified only in JPT. Other haplotype blocks between CHB and JPT were overlapped partly or completely. A number of 50 common tag SNPs between CHB and JPT were determined and the proportions in CHB and JPT were 64.9% and 70.4% respectively. Analysis of HapMap data provided an opportunity to avoid monotonic SNPs that had been included in ROR2 gene related etiologic studies. SNPs in ROR2 gene had common features in alleles, MAF, haplotype blocks and haplotype frequencies between CHB and JPT populations, which were consistent with the geographic and ethnic origins of the two populations.

  • Abstract
  • 10.1136/heartjnl-2014-306118.227
YIA4 Genetic Risk Markers for Atrial Fibrillation Influence Allelic Expression of Nearby Candidate Genes
  • May 31, 2014
  • Heart
  • Ruairidh Martin + 3 more

Genome-wide association studies (GWAS) have identified genetic variants in nine chromosomal regions that are associated with atrial fibrillation (AF). The mechanisms underlying these associations are unknown.To investigate these mechanisms, we...

  • Research Article
  • Cite Count Icon 28
  • 10.1097/01.hjh.0000226185.06063.80
The β2-adrenoceptor gene and hypertension: is it the promoter or the coding region or neither?
  • Jun 1, 2006
  • Journal of Hypertension
  • Ines N Hahntow + 2 more

The β2-adrenoceptor gene and hypertension: is it the promoter or the coding region or neither?

  • Research Article
  • Cite Count Icon 4
  • 10.1007/s10840-015-0086-1
Gene-guided therapy for catheter-ablation of atrial fibrillation: are we there yet?
  • Dec 11, 2015
  • Journal of Interventional Cardiac Electrophysiology
  • Henry Huang + 1 more

Gene-guided therapy for catheter-ablation of atrial fibrillation: are we there yet?

  • Dissertation
  • Cite Count Icon 1
  • 10.5353/th_b3831982
Data mining algorithms for genomic analysis
  • Jan 1, 2007
  • Sio-Iong Ao

With the results of many different genome-sequencing projects, hundreds of genomes from all branches of species have become available. Currently, one important task is to search for ways that can explain the organization and function of each genome. Data mining algorithms become very useful to extract the patterns from the data and to present it in such a way that can better our understanding of the structure, relation, and function of the subjects. In this work, data mining algorithms have been developed for solving some frontier problems in genomic analysis. It is estimated that there exist about ten million single-nucleotide polymorphisms (SNPs) in the human genome. The complete screening of all the SNPs in a genomic region becomes an expensive undertaking. The problem of selecting a subset of informative SNPs (tag SNPs) has been formulated as a hierarchical clustering problem with the development of a suitable similarity function for measuring the distances between the clusters. The proposed algorithm takes account of both functional and linkage disequilibrium information with the asymmetry thresholds for different SNPs, and does not have the difficulties of the block-detecting methods, which can result in different block boundaries. Experimental results supported that the algorithm is cost-effective for tag-SNP selection. More compact clusters can be produced with the algorithm to improve the efficiency of association studies. There are several different advantages of the linkage disequilibrium maps (LD maps) for genomic analysis. In this study, the construction of the LD mapping has been formulated as a non-parametric constrained unidimensional scaling problem, which is based on the LD information among the SNPs. This is different from the previous LD map, which is derived from the given Malecot model. Two procedures, one with the formulation as the least squares problem with nonnegativity and the other with the iterative algorithms, have been considered to solve this problem. The proposed maps can accommodate recombination events that have accumulated. Application of the proposed LD maps for human genome is presented. The linkage disequilibrium patterns in the LD maps can provide the genomic information like the hot and cold recombination regions, and can facilitate the study of recent selective sweeps across the human genome. Microarray has been the most widely used tool for assessing differences in mRNA abundance in the biological samples. Previous studies have successfully employed principal components analysis-neural network as a classifier of gene types, with continuous inputs and discrete outputs. An algorithm has been developed for testing the predictability of gene expression time series with PCA and NN components on a continuous numerical inputs and outputs basis. Comparisons of results support that our approach is a more realistic model for the gene network from a continuous prospective.

  • Research Article
  • Cite Count Icon 13
  • 10.1007/s00438-015-1165-9
Association genetics in Populus reveals the interactions between Pto-miR160a and its target Pto-ARF16.
  • Jan 5, 2016
  • Molecular Genetics and Genomics
  • Jiaxing Tian + 3 more

MicroRNAs (miRNAs) play important roles in the regulation of gene expression in various biological processes. However, the interactions between miRNAs and their targets are largely unknown in plants. As a powerful tool for identification of variation associated with traits, association genetics provides another strategy for exploration of interactions between miRNAs and their targets. Here, we conducted expression analysis and association mapping to evaluate the interaction between Pto-miR160a and its target Pto-ARF16 in Populus tomentosa. By examining the expression patterns of Pto-MIR160a and Pto-ARF16, we identified a significant, negative correlation between their expression levels, indicating that Pto-miR160a may affect the expression of Pto-ARF16. Among the single nucleotide polymorphisms (SNPs) identified in this study, one common SNP in the pre-miRNA region of Pto-miR160a altered its predicted secondary structure while another common SNP in the predicted miRNA target site changed the binding affinity of Pto-miR160a. Linkage disequilibrium (LD) analysis revealed low LD levels of Pto-MIR160a and Pto-ARF16, indicating that they are suitable for candidate gene-based association analysis. Single SNP-based association analysis identified 19 SNPs (false discovery rate Q<0.05) in Pto-MIR160a and Pto-ARF16 associated with three phenotypic traits. Epistasis analysis further identified 36 SNP-SNP interactions between SNPs in Pto-MIR160a and SNPs in Pto-ARF16, reflecting the possible genetic interaction of Pto-miR160a and Pto-ARF16. Taking these results together, our study identified SNPs in Pto-MIR160a and Pto-ARF16 associated with tree growth and wood properties, providing SNPs with potential applications in marker-assisted breeding and evidence for the genetic interaction of Pto-miR160a and Pto-ARF16.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant