AbstractWe consider sufficient dimension reduction for heterogeneous massive data. We show that, even in the presence of heterogeneity and nonlinear dependence, the minimizers of convex loss functions of linear regression fall into the central subspace at the population level. We suggest a distributed algorithm to perform sufficient dimension reduction, where the convex loss functions are approximated with surrogate quadratic losses. This allows to perform dimension reduction in a unified least squares framework and facilitates to transmit the gradients in our distributed algorithm. The minimizers of these surrogate quadratic losses possess a nearly oracle rate after a finite number of iterations. We conduct simulations and an application to demonstrate the effectiveness of our proposed distributed algorithm for heterogeneous massive data.
Read full abstract