Abstract

Generalized linear models (GLMs) are indispensable modeling tools in regression analysis. They provide a parsimonious and effective way to model the linear relationship between the mean response and a set of predictors on a link scale. Generalized single-index models (GSIMs) and generalized additive models (GAMs) are two popular extensions of GLMs that can account for any potential nonlinear relationship. Specifically, GSIMs introduce an unknown univariate function to the linear predictor, whereas GAMs replace the linear predictor with a sum of functional predictors. These two types of models remain in a parsimonious framework yet offer much flexibility for regression modeling in a nonparametric manner, leading to a wide range of applications in fields, such as econometrics, survival analysis, epidemiology, and ecology. This thesis begins with a general introduction to GLMs, GSIMs and GAMs, and highlights the close connections between these models. This is followed by a discussion of three separate topics on GSIMs and GAMs, consisting of the proposal of a profile likelihood ratio test (PLRT) statistic for parameter inferences in the standard GSIMs and the establishment of unified doubly-nonparametric frameworks for the extended GSIMs and GAMs. These novel methodologies also complete the standard GSIMs and GAMs and provide flexible modeling routines for analyzing real data examples. Firstly, we predominantly focus on the parameter inferences for GSIMs since they are less well investigated in comparison to the model fitting. To this end, a PLRT approach is proposed that is simple to implement, invariant against parametrization, and exhibits less bias than the standard Wald tests in finite sample settings. The ratio statistic is constructed as twice the difference between the maximum profile likelihoods that are achieved under the alternative and null hypotheses, respectively. This bypasses the explicit estimation of index coefficients’ covariance matrix that is usually biased due to using noisy plug-in estimators. Moreover, the PLRT is demonstrated to follow an asymptotically standard chi-squared distribution. This circumvents the bootstrapping procedure that is essential in obtaining the quantiles of null distribution in the recently proposed generalized likelihood ratio test (GLRT). Additionally, the proposed PLRT is shown to be over two magnitudes of order faster to carry out than the GLRT approach in our numerical studies. Note that the GLRT approach is designed only for the special case of additive errors with constant variance, whereas the proposed PLRT is based on the GLM framework. Thus, the PLRT approach is appropriate for handling data with non-constant variance that is typical of count, binomial, and time-to-event responses. These advantages of the proposed PLRT method are then demonstrated on various simulations and several real data examples. Secondly, we establish doubly-nonparametric frameworks for both GSIMs and GAMs by allowing the response distribution to be fully unknown. By rewriting the error distribution into an exponential tilt form, the extended models assume that the data still originates from some exponential family but without any need to specify a priori. In comparison to the commonly used quasi-likelihood-based methods, the proposed doubly-nonparametric models still remain in a full probability setting that is able to provide further insight into the data generating mechanisms. Thus, the proposed doubly-nonparametric models are particularly useful for model selection and diagnosis, predictive inferences, and nonparametric bootstrap resampling. By avoiding pre-specifying of the error distribution or the first two/higher moments of the data, the doubly-nonparametric methods reduce potential bias and inaccurate inferences induced by model misspecification and offer much flexibility and robustness for modeling. In addition, the seemingly impossible task of estimating the mean function and response distribution over their respective infinite-dimensional spaces becomes feasible by employing an empirical likelihood approach coupled with penalized regression splines. The consequent estimators in the mean model and error distribution are then shown to be root-n consistent and jointly asymptotically normal in distribution under some regularity conditions. In addition to extending the standard GSIMs and GAMs in a doubly-nonparametric manner, a profile empirical likelihood ratio test is developed for parameter inferences in the doubly-nonparametric GSIMs, and pointwise confidence bands for each smooth function and the overall mean curve are constructed for doubly-nonparametric GAMs from a frequentist approach. In each new framework, the extensive simulations and analyses of several real data examples demonstrate the satisfying performance of the proposed doubly-nonparametric method under both the correctly specified and misspecified model settings.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call