Abstract

As a result of novel data collection technologies, it is now common to encounter data in which the number of explanatory variables collected is large, while the number of variables that actually contribute to the model remains small. Thus, a method that can identify those variables with impact on the model without inferring other noneffective ones will make analysis much more efficient. Many methods are proposed to resolve the model selection problems under such circumstances, however, it is still unknown how large a sample size is sufficient to identify those “effective” variables. In this paper, we apply sequential sampling method so that the effective variables can be identified efficiently, and the sampling is stopped as soon as the “effective” variables are identified and their corresponding regression coefficients are estimated with satisfactory accuracy, which is new to sequential estimation. Both fixed and adaptive designs are considered. The asymptotic properties of estimates of the number of effective variables and their coefficients are established, and the proposed sequential estimation procedure is shown to be asymptotically optimal. Simulation studies are conducted to illustrate the performance of the proposed estimation method, and a diabetes data set is used as an example.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call