Innovative strategies for annotating the \u201crelationSNP\u201d between variants and molecular phenotypes

Jason E Miller,Yogasudha Veturi,Marylyn D Ritchie

doi:10.1186/s13040-019-0197-9

Jason E Miller, Yogasudha Veturi + Show 1 more

Open Access

PDF Available

https://doi.org/10.1186/s13040-019-0197-9

Copy DOI

Export

Save

Cite

Journal: BioData mining	Publication Date: May 14, 2019
Citations: 10	License type: open-access

Affiliation: University of Pennsylvania

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Characterizing how variation at the level of individual nucleotides contributes to traits and diseases has been an area of growing interest since the completion of sequencing the first human genome. Our understanding of how a single nucleotide polymorphism (SNP) leads to a pathogenic phenotype on a genome-wide scale is a fruitful endeavor for anyone interested in developing diagnostic tests, therapeutics, or simply wanting to understand the etiology of a disease or trait. To this end, many datasets and algorithms have been developed as resources/tools to annotate SNPs. One of the most common practices is to annotate coding SNPs that affect the protein sequence. Synonymous variants are often grouped as one type of variant, however there are in fact many tools available to dissect their effects on gene expression. More recently, large consortiums like ENCODE and GTEx have made it possible to annotate non-coding regions. Although annotating variants is a common technique among human geneticists, the constant advances in tools and biology surrounding SNPs requires an updated summary of what is known and the trajectory of the field. This review will discuss the history behind SNP annotation, commonly used tools, and newer strategies for SNP annotation. Additionally, we will comment on the caveats that distinguish approaches from one another, along with gaps in the current state of knowledge, and potential future directions. We do not intend for this to be a comprehensive review for any specific area of SNP annotation, but rather it will be an excellent resource for those unfamiliar with computational tools used to functionally characterize SNPs. In summary, this review will help illustrate how each SNP annotation method impacts the way in which the genetic and molecular etiology of a disease is explored in-silico.

Highlights

Scientific endeavors in human genetics, molecular biology, biochemistry, and bioinformatics have been progressively converging in order to more precisely describe how DNA variation explains differences in traits and diseases
Single base changes called single nucleotide polymorphisms or SNPs, along with changes where DNA has been inserted or deleted, which are referred to as indels have been popular forms of genetic variation to investigate. Another form of variation is in terms of copy number variants (CNVs), where large portions of the genome are duplicated or deleted
While it was funded by the National Institutes of Health (NIH) and Department of Energy (DOE), it was informally a product of international collaborations [3]

Summary

Introduction

Scientific endeavors in human genetics, molecular biology, biochemistry, and bioinformatics have been progressively converging in order to more precisely describe how DNA variation explains differences in traits and diseases. ML is a broad term used for algorithms that learn from a training dataset to improve the mathematical model prediction accuracy on a test data set Often, these methods use sequence conservation, amino acid physiochemical properties, gene regulatory annotations, allele frequency among sub-populations, and even the output of other tools. Combined Annotation-Dependent Depletion (CADD) annotations were derived from a support vector machine (SVM), a commonly used ML algorithm, to generate scores for 8.6 billion possible single nucleotide variants (SNVs) in the human reference genome based on 63 annotations that described conservation, gene regulatory information, and population frequencies [40].

Methodology

Findings

Conclusions