Abstract

Semiparametric generalized varying coefficient partially linear models with longitudinal data arise in contemporary biology, medicine, and life science. In this paper, we consider a variable selection procedure based on the combination of the basis function approximations and quadratic inference functions with SCAD penalty. The proposed procedure simultaneously selects significant variables in the parametric components and the nonparametric components. With appropriate selection of the tuning parameters, we establish the consistency, sparsity, and asymptotic normality of the resulting estimators. The finite sample performance of the proposed methods is evaluated through extensive simulation studies and a real data analysis.

Highlights

  • Identifying the significant variables is of great significance in all regression analysis

  • Many shrinkage methods have been developed for the purpose of computational efficiency, e.g., the nonnegative garrote [1], the LASSO [2], the bridge regression [3], the smoothly clipped absolute deviation (SCAD) [4], and the one-step sparse estimator [5]

  • We extend the quadratic inference functions (QIFs)-based group SCAD variable selection procedure to generalized partially linear varying coefficient model (GPLVCM) with longitudinal data, and the B-spline methods are adopted to approximate the nonparametric component in the model

Read more

Summary

Introduction

Identifying the significant variables is of great significance in all regression analysis. Wang et al [13] proposed a group SCAD procedure for variable selection of VCM with longitudinal data. Tian et al [15] proposed a QIF-based SCAD penalty for the variable selection for VCPLM with longitudinal data. We extend the QIF-based group SCAD variable selection procedure to GPLVCM with longitudinal data, and the B-spline methods are adopted to approximate the nonparametric component in the model. With suitable chosen tuning parameters, the proposed variable selection procedure is consistent, and the estimators of regression coefficients have oracle property, i.e., the estimators of the nonparametric components achieve the optimal convergence rate, and the estimators of the parametric components have the same asymptotic distribution as that based on the correct submodel.

Methodology
Asymptotic Properties
Simulation Studies
Method SCAD LASSO
Application to Infectious Disease Data
Method
Conclusion and Discussion
Findings
CCCCCCA

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.