Data-driven encoding for quantitative genetic trait prediction.

Dan He,Zhanyong Wang,Laxmi Parida

doi:10.1186/1471-2105-16-s1-s10

Abstract

MotivationGiven a set of biallelic molecular markers, such as SNPs, with genotype values on a collection of plant, animal or human samples, the goal of quantitative genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Quantitative genetic trait prediction is usually represented as linear regression models which require quantitative encodings for the genotypes: the three distinct genotype values, corresponding to one heterozygous and two homozygous alleles, are usually coded as integers, and manipulated algebraically in the model. Further, epistasis between multiple markers is modeled as multiplication between the markers: it is unclear that the regression model continues to be effective under this. In this work we investigate the effects of encodings to the quantitative genetic trait prediction problem.ResultsWe first showed that different encodings lead to different prediction accuracies, in many test cases. We then proposed a data-driven encoding strategy, where we encode the genotypes according to their distribution in the phenotypes and we allow each marker to have different encodings. We show in our experiments that this encoding strategy is able to improve the performance of the genetic trait prediction method and it is more helpful for the oligogenic traits, whose values rely on a relatively small set of markers. To the best of our knowledge, this is the first paper that discusses the effects of encodings to the genetic trait prediction problem.

Highlights

Whole genome prediction of complex phenotypic traits using high-density genotyping arrays has attracted a lot of attention, as it is relevant to the fields of plant and animal breeding and genetic epidemiology [1,2,3,4,5,6,7,8]
Each with m ≫ n genotype values and a trait value, and a set of n′ test samples each with the same set of genotype values but without trait value, the task is to train a predictive model from the training samples to predict the trait value, or phenotype of each test sample based on their genotype values
Effects of different encodings We first illustrate that different encodings lead to different prediction performances for epistasis model defined in Formula 2, a data-driven encoding has the benefit that we do not need to worry about the selection of the encodings

Summary

Introduction

Whole genome prediction of complex phenotypic traits using high-density genotyping arrays has attracted a lot of attention, as it is relevant to the fields of plant and animal breeding and genetic epidemiology [1,2,3,4,5,6,7,8]. The authors present four approaches: least-squares estimation, BayesA, BayesB, and rrBLUP (Ridge-Regression BLUP), based on a linear model of the marker effects on the trait being studied. The latter three methods are still competitive with the state-of-art techniques, and have . The problem is usually represented as the following linear regression model: m

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Feb 18, 2015
Citations: 25	License type: cc-by

R Discovery Prime

R Discovery Prime

Data-driven encoding for quantitative genetic trait prediction.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Performance evaluation of different encoding strategies for quantitative genetic trait prediction
Oyetunji E Ogundijo ... Laxmi Parida
-
Oyetunji E Ogundijo, et. al.Oyetunji E Ogundijo ... Laxmi Parida
01 Oct 2015
01 Oct 2015

Does encoding matter? A novel view on the quantitative genetic trait prediction problem.
Dan He ... Laxmi Parida
BMC Bioinformatics | VOL. Suppl 17 9
Dan He, et. al.Dan He ... Laxmi Parida
01 Jul 2016
BMC Bioinformatics | VOL. Suppl 17 9

Does encoding matter? A novel view on the quantitative genetic trait prediction problem
Dan He ... Laxmi Parida
-
Dan He, et. al. Dan He ... Laxmi Parida
01 Nov 2015
01 Nov 2015

Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction.
Dan He ... David Kuhn
Bioinformatics | VOL. 32
Dan He, et. al.Dan He ... David Kuhn
11 Jun 2016
Bioinformatics | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Data-driven encoding for quantitative genetic trait prediction.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics