We propose investigating optimal subsampling for functional regression with massive datasets based on the mode value, which is referred to as functional quasi-mode regression, to reduce data volume and alleviate computational burden. Using data-adaptive weights derived from regression residuals, the suggested regression offers enhanced robustness against nonnormal errors compared to traditional least squares or maximum likelihood estimation methods. To estimate the model, we employ B-spline basis functions to approximate the functional coefficient and include a penalty term in the objective function for enforcing smoothness in the resulting estimator. We adopt a computationally efficient mode-expectation-maximization algorithm, augmented by a Gaussian kernel, for numerical estimation. Under mild regularity conditions, we derive the asymptotic distributions of both full data and subsample quasi-mode estimators. The optimal subsampling probabilities by minimizing the asymptotic variance-covariance matrix under A- and L-optimality criteria are identified. These optimal probabilities rely on the full data estimate, prompting the development of a two-step algorithm to approximate the optimal subsampling procedure. The resultant algorithm is processing-efficient and can significantly reduce computational time compared to the full data approach. We also establish the asymptotic normality of the quasi-mode estimator obtained through this two-step algorithm. To assess finite sample performance, we conduct Monte Carlo simulations and analyze air quality data, showcasing the effectiveness of the developed estimator. Supplemental materials for this article are available online.
Read full abstract