Abstract

Prediction model building is one of the most important tasks in analysis of high-dimensional data. A fitted prediction model should be validated for future use. So, when conducting such an analysis, we have to use the whole data for both training and validation. When using a hold-out method, the fitted prediction model will be more efficient if the training set is bigger, but the validation power will be lower with a smaller validation set. In order to balance the efficiency of fitted prediction model and its validation, 50-50 allocation of the whole data set is popularly used as a hold-out method. In prediction and validation procedure, we have to use the information embedded in the whole data set as efficiently as possible. As a such effort, cross-validation methods (CV) have been very popular these days. In a CV method, a large portion of the data set is used to train models and the remaining small portion of the data is used for validation, and this procedure is repeated until the whole data set is used for validation. In a CV method, each data point is used for both training and validation, so that as the portion of training set is increased, the efficiency of training will be increased, while the validation power will be decreased due to the increased over-fitting, i.e. more frequent use of each data point for training. As another effort of efficient use of the whole data, we propose to use the whole data set for both training and validation, called 1-fold CV method. By using the whole data to fit a prediction model, training efficiency will be highest, but, by reusing the whole data set for validation, its validation power is expected to be very low. The validation power of CV methods will be estimated by permutation methods. Through extensive simulation and real data studies, we conclude that the newly proposed 1-fold CV method uses the available data set very efficiently

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call