Increasingly, large, nationally representative health and behavioral surveys conducted under a multistage stratified sampling scheme collect high dimensional data with correlation structured along some domain (eg, wearable sensor data measured continuously and correlated over time, imaging data with spatiotemporal correlation) with the goal of associating these data with health outcomes. Analysis of this sort requires novel methodologic work at the intersection of survey statistics and functional data analysis. Here, we address this crucial gap in the literature by proposing an estimation and inferential framework for generalizable scalar-on-function regression models for data collected under a complex survey design. We propose to: (1) estimate functional regression coefficients using weighted score equations; and (2) perform inference using novel functional balanced repeated replication and survey-weighted bootstrap for multistage survey designs. This is the first frequentist study to discuss the estimation of scalar-on-function regression models in the context of complex survey studies and to assess the validity of various inferential techniques based on re-sampling methods via a comprehensive simulation study. We implement our methods to predict mortality using diurnal activity profiles measured via wearable accelerometers using the National Health and Nutrition Examination Survey 2003-2006 data. The proposed computationally efficient methods are implemented in R software package surveySoFR.
Read full abstract