Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies.

Charlotte Wang,Chuhsing Kate Hsiao,Wen-Hsin Kao

doi:10.1371/journal.pone.0135918

Charlotte Wang, Chuhsing Kate Hsiao + Show 1 more

Open Access

https://doi.org/10.1371/journal.pone.0135918

Copy DOI

Journal: PLOS ONE	Publication Date: Aug 24, 2015
Citations: 33	License type: CC BY 4.0

Affiliation: National Taiwan University

Abstract

The availability of high-throughput genomic data has led to several challenges in recent genetic association studies, including the large number of genetic variants that must be considered and the computational complexity in statistical analyses. Tackling these problems with a marker-set study such as SNP-set analysis can be an efficient solution. To construct SNP-sets, we first propose a clustering algorithm, which employs Hamming distance to measure the similarity between strings of SNP genotypes and evaluates whether the given SNPs or SNP-sets should be clustered. A dendrogram can then be constructed based on such distance measure, and the number of clusters can be determined. With the resulting SNP-sets, we next develop an association test HDAT to examine susceptibility to the disease of interest. This proposed test assesses, based on Hamming distance, whether the similarity between a diseased and a normal individual differs from the similarity between two individuals of the same disease status. In our proposed methodology, only genotype information is needed. No inference of haplotypes is required, and SNPs under consideration do not need to locate in nearby regions. The proposed clustering algorithm and association test are illustrated with applications and simulation studies. As compared with other existing methods, the clustering algorithm is faster and better at identifying sets containing SNPs exerting a similar effect. In addition, the simulation studies demonstrated that the proposed test works well for SNP-sets containing a large proportion of neutral SNPs. Furthermore, employing the clustering algorithm before testing a large set of data improves the knowledge in confining the genetic regions for susceptible genetic markers.

Highlights

With the rapid advancements made in biotechnology, the volume and types of biological data collected have grown at an accelerated rate
The procedure we develop is based on the rational that the more individuals carrying the same genotype with respect to two given SNPs, the more similar these two SNPs should be considered, which is exactly what the Hamming distance does by assigning them a smaller value
We investigated if the performance of any of the association tests (HDAT, U statistic, and SKAT) can be improved by testing on the Hamming distance clusters

Summary

Objectives

The aim of our study is to develop a methodology of utilizing the Hamming distance metric to measure the distance between two sets of vectors containing discrete observations, in order to first perform clustering and to use this clustering to conduct association studies

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

ATRIUM: Testing Untyped SNPs in Case-Control Association Studies with Related Individuals
Zuoheng Wang ... Mary Sara Mcpeek
The American Journal of Human Genetics | VOL. 85
Zuoheng Wang, et. al.Zuoheng Wang ... Mary Sara Mcpeek
01 Nov 2009
The American Journal of Human Genetics | VOL. 85

Estimating Local Ancestry in Admixed Populations
Sriram Sankararaman ... Eran Halperin
The American Journal of Human Genetics | VOL. 82
Sriram Sankararaman, et. al.Sriram Sankararaman ... Eran Halperin
01 Feb 2008
The American Journal of Human Genetics | VOL. 82

Segmentation of retinal blood vessels using a novel clustering algorithm (RACAL) with a partial supervision strategy
Sameh A Salem ... Nancy M Salem
Medical & Biological Engineering & Computing | VOL. 45
Sameh A Salem, et. al.Sameh A Salem ... Nancy M Salem
15 Feb 2007
Medical & Biological Engineering & Computing | VOL. 45

A New Clustering Method for the Distribution Analysis of Hearing Neurons
Kuo-Sheng Cheng ... Shih-Ming Pan
-
Kuo-Sheng Cheng, et. al.Kuo-Sheng Cheng ... Shih-Ming Pan
01 Jan 2007
01 Jan 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE