Abstract
Given the genotype values of a set of biallelic molecular markers, such as Single Nucleotide Polymorphisms (SNPs), on a collection of plant, animal or human samples, quantitative genetic traits, such as weight, height, fruit size etc. of these samples can be predicted effectively. Quantitative genetic traits prediction has received great attention given that it helps breeding companies to develop more effective breeding strategies. Although lots of work have been proposed for the prediction task, relatively less attention has been paid on the effects of encodings of the genotypes values. Quantitative genetic trait prediction is usually presented as a linear regression model. In the regression model, genotypes need to be encoded numerically according to their types: one heterozygous type and two homozygous types. A traditional encoding encodes the two homozygous types as 0 and 2 respectively and the heterozygous type as 1. In this work, we evaluated five existing genetic encoding models as well as two recently proposed encoding methods which consider the genetic trait prediction problem as a multiple regression on categorical data problem. We also discussed the scenario of epistasis, where multiple markers could interact with each other. We evaluated the performance of five statistically-intuitive encoding strategies and eight biologically-oriented encoding strategies as well as the extensions of the previously mentioned two encoding methods. We showed that overall the two recent encoding methods achieve better prediction accuracy for both single marker scenario and epistasis scenario.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.