Abstract

AbstractIn multiple regression, when covariates are numerous, it is often reasonable to assume that only a small number of them has predictive information. In some medical applications for instance, it is believed that only a few genes out of thousands are responsible for cancer. In that case, the aim is not only to propose a good fit, but also to select the relevant covariates (genes). We propose to perform model selection with additive models in high dimensions (sample size and number of covariates). Our approach is computationally efficient thanks to fast wavelet transforms, it does not rely on cross validation, and it solves a convex optimization problem for a prescribed penalty parameter, called the quantile universal threshold. We also propose a second rule based on Stein unbiased risk estimation geared toward prediction. We use Monte Carlo simulations and real data to compare various methods based on false discovery rate (FDR), true positive rate (TPR) and mean squared error. Our approach is the only one to handle high dimensions, and has a good FDR–TPR trade‐off.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call