Abstract
A cross-validation method based on [Formula: see text] replications of two-fold cross validation is called an [Formula: see text] cross validation. An [Formula: see text] cross validation is used in estimating the generalization error and comparing of algorithms' performance in machine learning. However, the variance of the estimator of the generalization error in [Formula: see text] cross validation is easily affected by random partitions. Poor data partitioning may cause a large fluctuation in the number of overlapping samples between any two training (test) sets in [Formula: see text] cross validation. This fluctuation results in a large variance in the [Formula: see text] cross-validated estimator. The influence of the random partitions on variance becomes serious as [Formula: see text] increases. Thus, in this study, the partitions with a restricted number of overlapping samples between any two training (test) sets are defined as a block-regularized partition set. The corresponding cross validation is called block-regularized [Formula: see text] cross validation ([Formula: see text] BCV). It can effectively reduce the influence of random partitions. We prove that the variance of the [Formula: see text] BCV estimator of the generalization error is smaller than the variance of [Formula: see text] cross-validated estimator and reaches the minimum in a special situation. An analytical expression of the variance can also be derived in this special situation. This conclusion is validated through simulation experiments. Furthermore, a practical construction method of [Formula: see text] BCV by a two-level orthogonal array is provided. Finally, a conservative estimator is proposed for the variance of estimator of the generalization error.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.