Sparse reduced-rank regression is an important tool to uncover the large-scale response-predictor association network, as exemplified by modern applications such as the diffusion networks, and recommendation systems. However, the association networks recovered by existing methods are either sensitive to outliers or not scalable under the big data setup. In this paper, we propose a new statistical learning method called robust parallel pursuit (ROP) for joint estimation and outlier detection in large-scale response-predictor association network analysis. The proposed method is scalable in that it transforms the original large-scale network learning problem into a set of sparse unit-rank estimations via factor analysis, thus facilitating an effective parallel pursuit algorithm. Furthermore, we provide comprehensive theoretical guarantees including consistency in parameter estimation, rank selection, and outlier detection, and we conduct an inference procedure to quantify the uncertainty of existence of outliers. Extensive simulation studies and two real-data analyses demonstrate the effectiveness and the scalability of the suggested approach. History: Accepted by Ram Ramesh, Area Editor/Data Science & Machine Learning. Funding: This work was supported by the National Key R&D Program of China [Grant 2022YFA1008000], Natural Science Foundation of China [Grants 72071187, 72091212, 71731010, and 71921001], China Postdoctoral Science Foundation [Grant 2023M733402], and Fundamental Research Funds for the Central Universities [Grants WK3470000017, WK2040000027, and WK2040000079]. Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2022.0181 ) as well as from the IJOC GitHub software repository ( https://github.com/INFORMSJoC/2022.0181 ). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/ .
Read full abstract