Abstract

BackgroundMany QTL studies have two common features: (1) often there is missing marker information, (2) among many markers involved in the biological process only a few are causal. In statistics, the second issue falls under the headings “sparsity” and “causal inference”. The goal of this work is to develop a two-step statistical methodology for QTL mapping for markers with binary genotypes. The first step introduces a novel imputation method for missing genotypes. Outcomes of the proposed imputation method are probabilities which serve as weights to the second step, namely in weighted lasso. The sparse phenotype inference is employed to select a set of predictive markers for the trait of interest.ResultsSimulation studies validate the proposed methodology under a wide range of realistic settings. Furthermore, the methodology outperforms alternative imputation and variable selection methods in such studies. The methodology was applied to an Arabidopsis experiment, containing 69 markers for 165 recombinant inbred lines of a F8 generation. The results confirm previously identified regions, however several new markers are also found. On the basis of the inferred ROC behavior these markers show good potential for being real, especially for the germination trait Gmax.ConclusionsOur imputation method shows higher accuracy in terms of sensitivity and specificity compared to alternative imputation method. Also, the proposed weighted lasso outperforms commonly practiced multiple regression as well as the traditional lasso and adaptive lasso with three weighting schemes. This means that under realistic missing data settings this methodology can be used for QTL identification.

Highlights

  • Many quantitative trait loci (QTL) studies have two common features: (1) often there is missing marker information, (2) among many markers involved in the biological process only a few are causal

  • The missing genotypes are replaced with predicted values that are based on the observed genotypes at neighboring markers, as in the multiple QTL mapping (MQM) algorithm [8,9]

  • The third study focuses on comparison of sparse variable selection techniques, namely our weighted lasso, the traditional lasso [10] and adaptive lasso [18]

Read more

Summary

Introduction

Many QTL studies have two common features: (1) often there is missing marker information, (2) among many markers involved in the biological process only a few are causal. Most methods consider repeated single QTL models, but it is understood that modeling multiple QTLs simultaneously, as we consider in this paper, is superior to single QTL models [2] Often both the phenotype and genotype data are incomplete. In the context of QTL mapping, existing genotype imputation methods use phenotype data and multiple generation information to obtain a conditional probability of a missing genotype [6]. These methods are design-specific and lack generalizability [6,7]. The missing genotypes are replaced with predicted values that are based on the observed genotypes at neighboring markers, as in the multiple QTL mapping (MQM) algorithm [8,9]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.