Abstract

BackgroundRecent development of high-resolution single nucleotide polymorphism (SNP) arrays allows detailed assessment of genome-wide human genome variations. There is increasing recognition of the importance of SNPs for medicine and developmental biology. However, SNP data set typically has a large number of SNPs (e.g., 400 thousand SNPs in genome-wide Parkinson disease data set) and a few hundred of samples. Conventional classification methods may not be effective when applied to such genome-wide SNP data. ResultsIn this paper, we use shrunken dissimilarity measure to analyze and select relevant SNPs for classification problems. Examples of HapMap data and Parkinson disease (PD) data are given to demonstrate the effectiveness of the proposed method, and illustrate it has a potential to become a useful analysis tool for SNP data sets. We use Parkinson disease data as an example, and perform a whole genome analysis. For the 367440 SNPs with less than 1% missing percentage from all 22 chromosomes, we can select 357 SNPs from this data set. For the unique genes that those SNPs are located in, a gene-gene similarity value is computed using GOSemSim and gene pairs that has a similarity value being greater than a threshold are selected to construct several groups of genes. For the SNPs that involved in these groups of genes, a statistical software PLINK is employed to compute the pair-wise SNP-SNP interactions, and SNPs with significance of P < 0.01 are chosen to identify SNPs networks based on their P values. Here SNPs networks are constructed based on Gene Ontology knowledge, and therefore each SNP network plays a role in the biological process. An analysis shows that such networks have relationships directly or indirectly to Parkinson disease.ConclusionsExperimental results show that our approach is suitable to handle genetic variations, and provide useful knowledge in a genome-wide SNP study.

Highlights

  • Recent development of high-resolution single nucleotide polymorphism (SNP) arrays allows detailed assessment of genome-wide human genome variations

  • For the SNPs that involved in these groups of genes, we did a statistical analysis between these SNPs and all the other SNPs selected by our method using PLINK [24], which is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analysis in a computationally efficient manner

  • Accuracy decreases as the amount of shrinkage increases since less SNPs are used in the prediction

Read more

Summary

Introduction

Recent development of high-resolution single nucleotide polymorphism (SNP) arrays allows detailed assessment of genome-wide human genome variations. SNP data set typically has a large number of SNPs (e.g., 400 thousand SNPs in genome-wide Parkinson disease data set) and a few hundred of samples. Conventional classification methods may not be effective when applied to such genome-wide SNP data. SNPs are believed to result in differences between individuals, such as susceptibility to diseases [2]. They are abundant in human genome [3,4], which are considered as invaluable markers and potential powerful tools for both of genetic researches and applications in practice [5,6,7,8], for instance, disease gene. Thousands of SNPs in different regions of chromosomes are used to describe characteristics of patient/normal samples.

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call