Abstract

In regression analyses it is of interest to obtain prediction intervals of the response variables. However, such prediction intervals are not obvious if the number of explanatory variables exceeds the number of observations since the least square method cannot be used in this case. This paper discusses the problems of constructing prediction intervals in high dimensional regression models, in which the number of explanatory variables is greater than the number of observations. A quantile approach is proposed to construct such intervals and it has been evaluated by means of simulation. In this approach pairs of quantiles based on the certain probability are specified and followed by evaluation to obtain the shortest interval. Since the number of explanatory variables was large then several techniques to select the variables were employed. These techniques were the best subset regression, LASSO (Least Absolute Shrinkage and Selection Operator) regression, and model averaging. The simulation data was generated according to two different scenarios. The first scenario was designed for models having symmetric error distributions whereas the second scenario was designed for models with non-symmetric error distributions. The simulation results showed that in the case of symmetric error distributions all of the regression methods mentioned above produced similar prediction intervals, except the LASSO regression. However, in the case of non-symmetric error distributions it has been evidence that model averaging has provided the best prediction intervals when compared with the best subset and LASSO regressions although has wide range of intervals. This revealed that model averaging can be used to predict the response variables in high-dimensional regression analyses although the data is non-symmetrically distributed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call