Abstract

BackgroundSingle Nucleotide Polymorphisms (SNPs) are the most abundant form of genomic variation and can cause phenotypic differences between individuals, including diseases. Bases are subject to various levels of selection pressure, reflected in their inter-species conservation.ResultsWe propose a method that is not dependant on transcription information to score each coding base in the human genome reflecting the disease probability associated with its mutation. Twelve factors likely to be associated with disease alleles were chosen as the input for a support vector machine prediction algorithm. The analysis yielded 83% sensitivity and 84% specificity in segregating disease like alleles as found in the Human Gene Mutation Database from non-disease like alleles as found in the Database of Single Nucleotide Polymorphisms. This algorithm was subsequently applied to each base within all known human genes, exhaustively confirming that interspecies conservation is the strongest factor for disease association. For each gene, the length normalized average disease potential score was calculated. Out of the 30 genes with the highest scores, 21 are directly associated with a disease. In contrast, out of the 30 genes with the lowest scores, only one is associated with a disease as found in published literature. The results strongly suggest that the highest scoring genes are enriched for those that might contribute to disease, if mutated.ConclusionThis method provides valuable information to researchers to identify sensitive positions in genes that have a high disease probability, enabling them to optimize experimental designs and interpret data emerging from genetic and epidemiological studies.

Highlights

  • Single Nucleotide Polymorphisms (SNPs) are the most abundant form of genomic variation and can cause phenotypic differences between individuals, including diseases

  • Inter-species conservation as a measure of disease susceptibility Using the process detailed in Figure 1, we have exhaustively confirmed that known disease mutations in Human Gene Mutation Database (HGMD)

  • We have developed a comparative genomic analysis method for genome-wide identification of genome positions with a greater likelihood of being important to gene function. Mutations occurring at these sites have a higher probability of representing disease alleles

Read more

Summary

Introduction

Single Nucleotide Polymorphisms (SNPs) are the most abundant form of genomic variation and can cause phenotypic differences between individuals, including diseases. The most common variations are single nucleotide polymorphisms (SNPs), single base pair positions in the genome at which different sequence alternatives (alleles) exist They occur approximately once every 1,000 bases unevenly distributed across the human genome, principally in non-coding regions presumably due to higher selection pressure in coding regions [3]. While the characterization of all SNPs through disease association studies is economically and practically unrealistic, computational methods to rank SNPs based on their potential impact would help to select and focus on those base positions predicted to be strongly associated with disease To this end, a variety of approaches with different philosophies have been proposed. Futhermore, most methods including SIFT, predict effects of non-synonymous substitutions while recent studies have shown that synonymous substitutions, not resulting in a change in the encoded transcripts may none-the-less cause a measurable phenotypic change and sometimes disease [19]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call