A novel similarity-measure for the analysis of genetic data in complex phenotypes

Vincenzo Lagani,Alberto Montesanto,Giuseppina Rose,Victor Moreno,Fausta Di Cianni,Domenico Conforti,Giuseppe Passarino,Stefano Landi

doi:10.1186/1471-2105-10-s6-s24

Vincenzo Lagani, Alberto Montesanto + Show 6 more

Open Access

https://doi.org/10.1186/1471-2105-10-s6-s24

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Jun 1, 2009
Citations: 31	License type: cc-by

Affiliation: University of Calabria, University of Pisa

Abstract

BackgroundRecent technological advances in DNA sequencing and genotyping have led to the accumulation of a remarkable quantity of data on genetic polymorphisms. However, the development of new statistical and computational tools for effective processing of these data has not been equally as fast. In particular, Machine Learning literature is limited to relatively few papers which are focused on the development and application of data mining methods for the analysis of genetic variability. On the other hand, these papers apply to genetic data procedures which had been developed for a different kind of analysis and do not take into account the peculiarities of population genetics. The aim of our study was to define a new similarity measure, specifically conceived for measuring the similarity between the genetic profiles of two groups of subjects (i.e., cases and controls) taking into account that genetic profiles are usually distributed in a population group according to the Hardy Weinberg equilibrium.ResultsWe set up a new kernel function consisting of a similarity measure between groups of subjects genotyped for numerous genetic loci. This measure weighs different genetic profiles according to the estimates of gene frequencies at Hardy-Weinberg equilibrium in the population. We named this function the "Hardy-Weinberg kernel".The effectiveness of the Hardy-Weinberg kernel was compared to the performance of the well established linear kernel. We found that the Hardy-Weinberg kernel significantly outperformed the linear kernel in a number of experiments where we used either simulated data or real data.ConclusionThe "Hardy-Weinberg kernel" reported here represents one of the first attempts at incorporating genetic knowledge into the definition of a kernel function designed for the analysis of genetic data. We show that the best performance of the "Hardy-Weinberg kernel" is observed when rare genotypes have different frequencies in cases and controls. The ability to capture the effect of rare genotypes on phenotypic traits might be a very important and useful feature, as most of the current statistical tools loose most of their statistical power when rare genotypes are involved in the susceptibility to the trait under study.

Highlights

Recent technological advances in DNA sequencing and genotyping have led to the accumulation of a remarkable quantity of data on genetic polymorphisms
The "Hardy-Weinberg kernel" reported here represents one of the first attempts at incorporating genetic knowledge into the definition of a kernel function designed for the analysis of genetic data
It is clear that the similarity measure computed by the linear kernel merely consists in the sum of Single Nucleotide Polymorphisms (SNPs) presenting the same genotypes in both genetic profiles X1 and X2

Summary

Introduction

Recent technological advances in DNA sequencing and genotyping have led to the accumulation of a remarkable quantity of data on genetic polymorphisms. Recent advances in DNA technology have led to the accumulation of a remarkable quantity of data on genetic polymorphisms. The availability of ultra-high-volume genotyping platforms at a manageable cost has permitted genome-wide association studies where genetic profiles observed in groups of affected subjects (cases) are compared to groups of healthy subjects (controls) in order to identify multiple low-penetrance variants involved in complex phenotypes [1,2,3,4,5]. The development of new statistical and computer based tools for the effective processing of the large amount of data arising from these studies has not evolved as fast (for a review see [12])

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A novel similarity-measure for the analysis of genetic data in complex phenotypes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Coagulase gene polymorphism of Staphylococcus aureus isolates from dairy cattle in different geographical areas.
C Su ... O Skardova
Epidemiology and infection | VOL. 122
C Su, et. al.C Su ... O Skardova
01 Apr 1999
Epidemiology and infection | VOL. 122

Fast, Accurate and Robust Recognition Based On Local Normalized Linear Summation Kernel
Kazuhiro Hotta
-
Kazuhiro HottaKazuhiro Hotta
01 Dec 2007
01 Dec 2007

Local normalized linear summation kernel for fast and robust recognition
Kazuhiro Hotta
Pattern Recognition | VOL. 43
Kazuhiro HottaKazuhiro Hotta
12 Sep 2009
Pattern Recognition | VOL. 43

Methylenetetrahydrofolate reductase (MTHFR) C677T polymorphism is associated with osteoporotic vertebral fractures, but is a weak predictor of BMD.
Morten M Villadsen ... Liselotte Stenkj�R
Osteoporosis International | VOL. 16
Morten M Villadsen, et. al.Morten M Villadsen ... Liselotte Stenkj�R
06 Aug 2004
Osteoporosis International | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A novel similarity-measure for the analysis of genetic data in complex phenotypes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics