Abstract

Both molecular marker and gene expression data were considered alone as well as jointly to serve as additive predictors for two pathogen-activity-phenotypes in real recombinant inbred lines of soybean. For unobserved phenotype prediction, we used a Bayesian hierarchical regression modeling, where the number of possible predictors in the model was controlled by different selection strategies tested. Our initial findings were submitted for DREAM5 (the 5th Dialogue on Reverse Engineering Assessment and Methods challenge) and were judged to be the best in sub-challenge B3 wherein both functional genomic and genetic data were used to predict the phenotypes. In this work we further improve upon this previous work by considering various predictor selection strategies and cross-validation was used to measure accuracy of in-data and out-data predictions. The results from various model choices indicate that for this data use of both data types (namely functional genomic and genetic) simultaneously improves out-data prediction accuracy. Adequate goodness-of-fit can be easily achieved with more complex models for both phenotypes, since the number of potential predictors is large and the sample size is not small. We also further studied gene-set enrichment (for continuous phenotype) in the biological process in question and chromosomal enrichment of the gene set. The methodological contribution of this paper is in exploration of variable selection techniques to alleviate the problem of over-fitting. Different strategies based on the nature of covariates were explored and all methods were implemented under the Bayesian hierarchical modeling framework with indicator-based covariate selection. All the models based in careful variable selection procedure were found to produce significant results based on permutation test.

Highlights

  • The development of efficient statistical methods which can provide accurate prediction of the unobserved phenotype based on genomic profile of an individual is the target in many research fields including human, animal and plant genetics [1,2,3]

  • Lee et al (2008) [2] considered that methods for predictions of unobserved phenotypes and genomic breeding values have same goal and can be successfully substituted for one another. Such prediction methods consider a single type of genomic data for prediction at a time even if prediction accuracy may be improved by considering multiple data types simultaneously [1]

  • Prediction based on single feature polymorphism (SFP) data only Correlations between observed and predicted phenotype values in Figure 4 indicate the following: In-data prediction for both phenotypes improves with increase in number of SFPs in the model

Read more

Summary

Introduction

The development of efficient statistical methods which can provide accurate prediction of the unobserved phenotype based on genomic profile of an individual is the target in many research fields including human, animal and plant genetics [1,2,3]. There has been recent interest to apply Bayesian variable selection [5] and frequentist regularization methods [6] to perform parameter estimation and variable selection simultaneously in phenotype-genotype and phenotypeexpression association analyses. These methods performed well in selecting important subset of trait-associated loci to estimate genomic breeding values in animals and plants [7,8,9,10]. Lee et al (2008) [2] considered that methods for predictions of unobserved phenotypes and genomic breeding values have same goal and can be successfully substituted for one another. Such prediction methods consider a single type of genomic data (molecular marker, gene expression or protein expression) for prediction at a time even if prediction accuracy may be improved by considering multiple data types simultaneously [1]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.