With the exponential growth of data across diverse fields, applying conventional statistical methods directly to large-scale datasets has become computationally infeasible. To overcome this challenge, subsampling algorithms are widely used to perform statistical analyses on smaller, more manageable subsets of the data. The effectiveness of these methods depends on their ability to identify and select data points that improve the estimation efficiency according to some optimality criteria. While much of the existing research has focused on subsampling techniques for independent data, there is considerable potential for developing methods tailored to dependent data, particularly in time-dependent contexts. In this study, we extend subsampling techniques to irregularly spaced time series data which are modeled by irregularly spaced autoregressive models. We present frameworks for various subsampling approaches, including optimal subsampling under A-optimality, information-based optimal subdata selection, and sequential thinning on streaming data. These methods use A-optimality or D-optimality criteria to assess the usefulness of each data point and prioritize the inclusion of the most informative ones. We then assess the performance of these subsampling methods using numerical simulations, providing insights into their suitability and effectiveness for handling irregularly spaced long time series. Numerical results show that our algorithms have promising performance. Their estimation efficiency can be ten times as high as that of the uniform sampling estimator. They also significantly reduce the computational time and can be up to forty times faster than the full-data estimator.
Read full abstract