Abstract

The coefficient of determination (CoD) has significant applications in genomics, for example, in the inference of gene regulatory networks. We study several CoD estimators, based upon the resubstitution, leave-one-out, cross-validation, and bootstrap error estimators. We present an exact formulation of performance metrics for the resubstitution and leave-one-out CoD estimators, assuming the discrete histogram rule. Numerical experiments are carried out using a parametric Zipf model, where we compute exact performance metrics of resubstitution and leave-one-out CoD estimators using the previously derived equations, for varying actual CoD, sample size, and bin size. These results are compared to approximate performance metrics of 10-repeated 2-fold cross-validation and 0.632 bootstrap CoD estimators, computed via Monte Carlo sampling. The numerical results lead to a perhaps surprising conclusion: under the Zipf model under consideration, and for moderate and large values of the actual CoD, the resubstitution CoD estimator is the least biased and least variable among all CoD estimators, especially at small number of predictors. We also observed that the leave-one-out and cross-validation CoD estimators tend to perform the worst, whereas the performance of the bootstrap CoD estimator is intermediary, despite its high computational complexity.

Highlights

  • The coefficient of determination (CoD) has significant applications in genomics, for example, in the inference of gene regulatory networks

  • Numerical experiments are carried out using a parametric Zipf model, where we compute the exact performance of resubstitution and leave-one-out CoD estimators using the previously derived equations, for varying actual CoD, sample size, and bin size

  • The leave-one-out and cross-validation CoD estimator tend to perform the worst whereas the performance of the bootstrap CoD estimator is intermediary, despite its high computational complexity. This indicates that provided one has evidence of moderate to tight regulation between the genes, and the number of predictors is not too large, one should use the CoD estimator based on resubstitution

Read more

Summary

Introduction

The coefficient of determination (CoD) has significant applications in genomics, for example, in the inference of gene regulatory networks. Numerical experiments are carried out using aparametric Zipf model, where we compute exact performance metrics of resubstitution and leave-oneout CoD estimators using the previously derived equations, for varying actual CoD, sample size, and bin size. These results are compared to approximate performance metrics of10-repeated 2-fold cross-validation and 0.632 bootstrap CoD estimators, computed via Monte Carlo sampling. Numerical experiments are carried out using a parametric Zipf model, where we compute the exact performance of resubstitution and leave-one-out CoD estimators using the previously derived equations, for varying actual CoD, sample size, and bin size We compare these results to approximate performance metrics of randomized CoD estimators (bootstrap and cross-validation), computed via Monte Carlo sampling.

Discrete Prediction
CoD Estimation
Performance Metrics of CoD Estimators
Exact Moments of Nonrandomized CoD Estimators
Numerical Experiments
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.