Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Differentially Private Sliced Inverse Regression in the Federated Paradigm

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Sliced inverse regression (SIR), which includes linear discriminant analysis (LDA) as a special case, is a popular and powerful dimension reduction tool. In this article, we extend SIR to address the challenges of decentralized data, prioritizing privacy and communication efficiency. Our approach, termed as federated sliced inverse regression (FSIR), facilitates distributed computing of the sufficient dimension reduction subspace among multiple clients, solely sharing local estimates to protect sensitive datasets from exposure. To guard against potential adversary attacks, FSIR employs diverse perturbation strategies, including a novel vectorized Gaussian mechanism that guarantees ( ε , δ ) -differential privacy at a low cost of statistical accuracy. Additionally, FSIR achieves a tight composition of various privacy mechanisms by adopting a hypothesis testing perspective on differential privacy. It also incorporates a collaborative feature screening procedure, enabling effective handling of high-dimensional client data with varying feature sets. Theoretical properties of FSIR are established for both low-dimensional and high-dimensional settings, supported by extensive numerical experiments and real data analysis. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.

Similar Papers
  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.csda.2024.108041
Minimax rates of convergence for sliced inverse regression with differential privacy
  • Aug 22, 2024
  • Computational Statistics and Data Analysis
  • Wenbiao Zhao + 2 more

Minimax rates of convergence for sliced inverse regression with differential privacy

  • Research Article
  • Cite Count Icon 17
  • 10.1080/02331888.2013.800067
Supervised invariant coordinate selection
  • May 30, 2013
  • Statistics
  • Eero Liski + 2 more

Dimension reduction plays an important role in high-dimensional data analysis. Principal component analysis, independent component analysis, and sliced inverse regression (SIR) are well known but very different analysis tools for the dimension reduction. It appears that these three approaches can all be seen as a comparison of two different scatter matrices S1 and S2. The components for dimension reduction are then given by the eigenvectors of . In SIR, the second scatter matrix is supervised and therefore the choice of the components is based on the dependence between the observed random vector and a response variable. Based on these notions, we extend the invariant coordinate selection (ICS), allowing the second scatter matrix S2 to be supervised; supervised ICS can then be used in supervised dimension reduction. It is remarkable that many supervised dimension reduction methods proposed in the literature such as the linear discriminant analysis, canonical correlation analysis, SIR, sliced average variance estimate, directional regression, and principal Hessian directions can be reformulated in this way. Several families of supervised scatter matrices are discussed, and their use in supervised dimension reduction is illustrated with a real data example and simulations.

  • Research Article
  • Cite Count Icon 10
  • 10.1109/embc.2015.7318934
Supervised nonlinear dimension reduction of functional magnetic resonance imaging data using Sliced Inverse Regression.
  • Aug 1, 2015
  • Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
  • Yiheng Tu + 5 more

Dimension reduction is essential for identifying a small set of discriminative features that are predictive of behavior or cognition from high-dimensional functional magnetic resonance imaging (fMRI) data. However, conventional linear dimension reduction techniques cannot reduce the dimension effectively if the relationship between imaging data and behavioral parameters are nonlinear. In the paper, we proposed a novel supervised dimension reduction technique, named PC-SIR (Principal Component - Sliced Inverse Regression), for analyzing high-dimensional fMRI data. The PC-SIR method is an important extension of the renowned SIR method, which can achieve the effective dimension reduction (e.d.r.) directions even the relationship between class labels and predictors is nonlinear but is unable to handle high-dimensional data. By using PCA prior to SIR to orthogonalize and reduce the predictors, PC-SIR can overcome the limitation of SIR and thus can be used for fMRI data. Simulation showed that PC-SIR can result in a more accurate identification of brain activation as well as better prediction than support vector regression (SVR) and partial least square regression (PLSR). Then, we applied PC-SIR on real fMRI data recorded in a pain stimulation experiment to identify pain-related brain regions and predict the pain perception. Results on 32 subjects showed that PC-SIR can lead to significantly higher prediction accuracy than SVR and PLSR. Therefore, PC-SIR could be a promising dimension reduction technique for multivariate pattern analysis of fMRI.

  • Conference Article
  • Cite Count Icon 1
  • 10.1145/3107411.3108225
Integrative Sufficient Dimension Reduction Methods for Multi-Omics Data Analysis
  • Aug 20, 2017
  • Yashita Jain + 1 more

With the advent of high throughput genome-wide assays it has become possible to simultaneously measure multiple types of genomic data. Several projects like TCGA, ICGC, NCI-60 has generated comprehensive, multi-dimensional maps of the key genomic changes like MiRNA, MRNA, proteomics etc. from cancer samples[2,4]. These genomic data can be used for classifying tumour types[5]. Integrative analysis of these data from multiple sources can potentially provide additional biological insights, but methods to do any such analysis are lacking. One of the widely used solutions to handle high dimension data is by removing redundant information in the integrated sample. Most of the expressed genes are overlapped and can be projected onto lower dimension, and then be used to classify different tumor types, without the loss of any/much information. Sufficient dimension reduction (SDR) [1], a supervised dimension reduction approach, can be ideal to achieve such a goal. In this paper, we propose a novel integrative SDR method that can reduce dimensions of multiple data types simultaneously while sharing common latent structures to improve prediction and interpretation. In particular, we extend the sliced inverse regression (SIR) technique, a major SDR method, to integrate multiple omits data for simultaneous dimension reduction. SIR is a supervised dimension reduction method that assumes that the outcome variable Y depends on the predictor variable X through d unknown linear combinations of the predictor[3]. The predictor variable is replaced by its projection into a lower dimension subspace of the predictor space without the loss of information. The aim is to find the intersection of all the subspaces δ called the central susbspace (CS) of the predictor space satisfying the property Y ╨ X| Pδ X. To integrate multiple types of data, we propose and implement a new integrative sufficient dimension reduction method extending SIR[3], called integrative SIR. The main idea is that we take into account all the multi-omics data information simultaneously while finding a basis matrix for each data type with some sharing latent structures. Finally, we get d dimension data which is much smaller than the original data dimension. The reduced dimension d was achieved by cross validation. To demonstrate the integrated analysis of multi-omics data, we applied and compared conventional SIR and integrative SIR to analyze MRNA, MiRNA and proteomics expression profile of a subset of cell lines from the NCI-60 panel. The data used is taken from [6]. The outcomes we have to classify are CNS, Leukemia and Melanoma tumor types. We pre-screened 400 variables from each data type with the criteria of high variance. To find classification error, we performed random forest classification after we applied to each method with leave-one-out cross-validation. As a result, we found out that integrative SIR leads to less classification error as compared to conventional SIR. To summarize, we proposed a new integrative SIR method, a supervised dimension reduction technique for integrative analysis of multi-omics data types. Unlike conventional SDR methods, the new approach can reduce the dimensions of multiple omics data simultaneously while sharing common latent structures across data types without losing any information in prediction. By efficiently capturing the common information, our numerical study shows that integrative SIR classifies tumor types more accurately as compared to conventional SDR methods.

  • Research Article
  • Cite Count Icon 8
  • 10.1080/07350015.2021.1910041
High-Dimensional Elliptical Sliced Inverse Regression in Non-Gaussian Distributions
  • May 2, 2021
  • Journal of Business & Economic Statistics
  • Xin Chen + 2 more

Sliced inverse regression (SIR) is the most widely used sufficient dimension reduction method due to its simplicity, generality and computational efficiency. However, when the distribution of covariates deviates from multivariate normal distribution, the estimation efficiency of SIR gets rather low, and the SIR estimator may be inconsistent and misleading, especially in the high-dimensional setting. In this article, we propose a robust alternative to SIR—called elliptical sliced inverse regression (ESIR), to analysis high-dimensional, elliptically distributed data. There are wide applications of elliptically distributed data, especially in finance and economics where the distribution of the data is often heavy-tailed. To tackle the heavy-tailed elliptically distributed covariates, we novelly use the multivariate Kendall’s tau matrix in a framework of generalized eigenvalue problem in sufficient dimension reduction. Methodologically, we present a practical algorithm for our method. Theoretically, we investigate the asymptotic behavior of the ESIR estimator under the high-dimensional setting. Extensive simulation results show ESIR significantly improves the estimation efficiency in heavy-tailed scenarios, compared with other robust SIR methods. Analysis of the Istanbul stock exchange dataset also demonstrates the effectiveness of our proposed method. Moreover, ESIR can be easily extended to other sufficient dimension reduction methods and applied to nonelliptical heavy-tailed distributions.

  • Research Article
  • 10.1093/jrsssb/qkaf038
A unified generalization of the inverse regression methods via column selection
  • Jul 2, 2025
  • Journal of the Royal Statistical Society Series B: Statistical Methodology
  • Yin Jin + 1 more

A bottleneck of sufficient dimension reduction (SDR) in the modern era is that, among numerous methods, only sliced inverse regression (SIR) is generally applicable in high-dimensional settings. The higher-order inverse regression methods, which form a major family of SDR methods superior to SIR at the population level, suffer from the dimensionality of their intermediate matrix-valued parameters which have excessive columns. In this paper, we propose to use a small subset of columns of the matrix-valued parameter for SDR estimation, which breaks the convention of using the ambient matrix in the higher-order inverse regression methods. With a quick column selection procedure, we then generalize these methods and their ensembles in high-dimensional sparse settings, in a uniform manner that resembles sparse SIR without additional assumptions. This is the first promising attempt in the literature to free the higher-order inverse regression methods from their dimensionality, thereby facilitating the application of SDR. Some numerical illustrations, including both simulation studies and a real data example, are provided at the end.

  • Research Article
  • 10.1016/j.csda.2024.108071
Online kernel sliced inverse regression
  • Oct 16, 2024
  • Computational Statistics and Data Analysis
  • Jianjun Xu + 2 more

Online kernel sliced inverse regression

  • Research Article
  • Cite Count Icon 2266
  • 10.1080/01621459.1991.10475035
Sliced Inverse Regression for Dimension Reduction
  • Jun 1, 1991
  • Journal of the American Statistical Association
  • Ker-Chau Li

Modern advances in computing power have greatly widened scientists' scope in gathering and investigating information from many variables, information which might have been ignored in the past. Yet to effectively scan a large pool of variables is not an easy task, although our ability to interact with data has been much enhanced by recent innovations in dynamic graphics. In this article, we propose a novel data-analytic tool, sliced inverse regression (SIR), for reducing the dimension of the input variable x without going through any parametric or nonparametric model-fitting process. This method explores the simplicity of the inverse view of regression; that is, instead of regressing the univariate output variable y against the multivariate x, we regress x against y. Forward regression and inverse regression are connected by a theorem that motivates this method. The theoretical properties of SIR are investigated under a model of the form, y = f(β 1 x, …, β K x, ε), where the β k 's are the unknown row vectors. This model looks like a nonlinear regression, except for the crucial difference that the functional form of f is completely unknown. For effectively reducing the dimension, we need only to estimate the space [effective dimension reduction (e.d.r.) space] generated by the β k 's. This makes our goal different from the usual one in regression analysis, the estimation of all the regression coefficients. In fact, the β k 's themselves are not identifiable without a specific structural form on f. Our main theorem shows that under a suitable condition, if the distribution of x has been standardized to have the zero mean and the identity covariance, the inverse regression curve, E(x | y), will fall into the e.d.r. space. Hence a principal component analysis on the covariance matrix for the estimated inverse regression curve can be conducted to locate its main orientation, yielding our estimates for e.d.r. directions. Furthermore, we use a simple step function to estimate the inverse regression curve. No complicated smoothing is needed. SIR can be easily implemented on personal computers. By simulation, we demonstrate how SIR can effectively reduce the dimension of the input variable from, say, 10 to K = 2 for a data set with 400 observations. The spin-plot of y against the two projected variables obtained by SIR is found to mimic the spin-plot of y against the true directions very well. A chi-squared statistic is proposed to address the issue of whether or not a direction found by SIR is spurious.

  • Research Article
  • Cite Count Icon 8
  • 10.1109/tcyb.2016.2526630
Sliced Inverse Regression With Adaptive Spectral Sparsity for Dimension Reduction.
  • Apr 5, 2016
  • IEEE Transactions on Cybernetics
  • Xiao-Lin Xu + 3 more

Dimension reduction is an important topic in pattern analysis and machine learning, and it has wide applications in feature representation and pattern classification. In the past two decades, sliced inverse regression (SIR) has attracted much research efforts due to its effectiveness and efficacy in dimension reduction. However, two drawbacks limit further applications of SIR. First, the computation complexity of SIR is usually high in the situation of high-dimensional data. Second, sparsity of projection subspace is not well mined for improving the feature selection and model interpretation abilities. This paper proposes to compute the SIR projection vectors in the spectral space, then an approximated regression solution can be obtained with a faster speed. Moreover, the adaptive lasso is used to attain a sparse and globally optimal solution, which is important in variable selection. To complete the robust pattern classification task with corruptions, a correntropy-based and class-wise regression model is designed in this paper. It takes a smooth penalty instead of sparsity constraint in the regression coefficients, and it can be conducted in class-wise, thus it is more flexible in practice. Extensive experiments are conducted by using some real and benchmark data sets, e.g., high-dimensional facial images and gene microarray data, to evaluate the new algorithms. The new proposals attain competitive results and are compared with other state-of-the-art methods.

  • Research Article
  • Cite Count Icon 122
  • 10.1198/106186008x345161
Kernel Sliced Inverse Regression with Applications to Classification
  • Sep 1, 2008
  • Journal of Computational and Graphical Statistics
  • Han-Ming Wu

Sliced inverse regression (SIR) was introduced by Li to find the effective dimension reduction directions for exploring the intrinsic structure of high-dimensional data. In this study, we propose a hybrid SIR method using a kernel machine which we call kernel SIR. The kernel mixtures result in the transformed data distribution being more Gaussian like and symmetric; providing more suitable conditions for performing SIR analysis. The proposed method can be regarded as a nonlinear extension of the SIR algorithm. We provide a theoretical description of the kernel SIR algorithm within the framework of reproducing kernel Hilbert space (RKHS). We also illustrate that kernel SIR performs better than several standard methods for discriminative, visualization, and regression purposes. We show how the features found with kernel SIR can be used for classification of microarray data and several other classification problems and compare the results with those obtained with several existing dimension reduction techniques. The results show that kernel SIR is a powerful nonlinear feature extractor for classification problems.

  • Research Article
  • 10.1007/s13171-017-0102-x
The Effect of Data Contamination in Sliced Inverse Regression and Finite Sample Breakdown Point
  • May 31, 2017
  • Sankhya A
  • Ulrike Genschel

Dimension reduction procedures have received increasing consideration over the past decades. Despite this attention, the effect of data contamination or outlying data points in dimension reduction is, however, not well understood, and is compounded by the issue that outliers can be difficult to classify in the presence of many variables. This paper formally investigates the influence of data contamination for sliced inverse regression (SIR), which is a prototypical dimension reduction procedure that targets a lower-dimensional subspace of a set of regressors needed to explain a response variable. We establish a general theory for how estimated reduction subspaces can be distorted through both the number and direction of outlying data points. The results depend critically on the regressor covariance structure and the most harmful types of data contamination are shown to differ in cases where this covariance structure is known or unknown. For example, if the covariance structure is estimated, data contamination is proven to produce an estimated subspace that is automatically orthogonal to the directions of outlying data points, constituting a potentially serious loss of information. Our main results demonstrate the degree to which data contamination indeed causes incorrect dimension reduction, depending on the amount, magnitude, and direction of contamination. Further, by metricizing distances between dimension reduction subspaces, worst case results for data contamination can be formulated to define a finite sample breakdown point for SIR as a measure of global robustness. Our theoretical findings are illustrated through simulation.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 3
  • 10.3389/fnins.2016.00556
A Sliced Inverse Regression (SIR) Decoding the Forelimb Movement from Neuronal Spikes in the Rat Motor Cortex
  • Dec 9, 2016
  • Frontiers in Neuroscience
  • Shih-Hung Yang + 12 more

Several neural decoding algorithms have successfully converted brain signals into commands to control a computer cursor and prosthetic devices. A majority of decoding methods, such as population vector algorithms (PVA), optimal linear estimators (OLE), and neural networks (NN), are effective in predicting movement kinematics, including movement direction, speed and trajectory but usually require a large number of neurons to achieve desirable performance. This study proposed a novel decoding algorithm even with signals obtained from a smaller numbers of neurons. We adopted sliced inverse regression (SIR) to predict forelimb movement from single-unit activities recorded in the rat primary motor (M1) cortex in a water-reward lever-pressing task. SIR performed weighted principal component analysis (PCA) to achieve effective dimension reduction for nonlinear regression. To demonstrate the decoding performance, SIR was compared to PVA, OLE, and NN. Furthermore, PCA and sequential feature selection (SFS) which are popular feature selection techniques were implemented for comparison of feature selection effectiveness. Among SIR, PVA, OLE, PCA, SFS, and NN decoding methods, the trajectories predicted by SIR (with a root mean square error, RMSE, of 8.47 ± 1.32 mm) was closer to the actual trajectories compared with those predicted by PVA (30.41 ± 11.73 mm), OLE (20.17 ± 6.43 mm), PCA (19.13 ± 0.75 mm), SFS (22.75 ± 2.01 mm), and NN (16.75 ± 2.02 mm). The superiority of SIR was most obvious when the sample size of neurons was small. We concluded that SIR sorted the input data to obtain the effective transform matrices for movement prediction, making it a robust decoding method for conditions with sparse neuronal information.

  • Research Article
  • Cite Count Icon 1
  • 10.1088/1742-6596/1664/1/012034
Robust variable selection in sliced inverse regression using Tukey’s biweight criterion and ball covariance
  • Nov 1, 2020
  • Journal of Physics: Conference Series
  • Ali Alkenani

The shrinkage sliced inverse (SSIR) is a variable selection method under the settings of sufficient dimension reduction (SDR) theory. The SSIR merges the ideas of Lasso shrinkage and sliced inverse regression (SIR) to obtain sparse and accurate solutions. However, the dependency of SSIR on squared loss function and classical estimates for location and dispersion measures make it very sensitive to outliers. In this paper, a robust variable selection method based on SSIR, which is called RSSIR, is proposed. The squared loss is replaced by Tukey’s biweight criterion. Also, the classical estimates of the mean and covariance matrix are replaced with the median and ball covariance, which are robust measures for location and dispersion, respectively. In both the response and covariates, the proposed RSSIR is resistant to outliers. In addition, a robust version of the residual information criterion (RIC) is proposed to select the regularisation parameter. Depending on the results of simulations and real data analysis, very reliable results are achieved through RSSIR. In the presence of outliers, the performance of RSSIR is significantly better than the performance of SSIR and other existing methods.

  • Research Article
  • Cite Count Icon 2
  • 10.1198/106186008x285573
Sliced Coordinate Analysis for Effective Dimension Reduction and Nonlinear Extensions
  • Mar 1, 2008
  • Journal of Computational and Graphical Statistics
  • Zhihua Zhang + 3 more

Sliced inverse regression (SIR) is an important method for reducing the dimensionality of input variables. Its goal is to estimate the effective dimension reduction directions. In classification settings, SIR is closely related to Fisher discriminant analysis. Motivated by reproducing kernel theory, we propose a notion of nonlinear effective dimension reduction and develop a nonlinear extension of SIR called kernel SIR (KSIR). Both SIR and KSIR are based on principal component analysis. Alternatively, based on principal coordinate analysis, we propose the dual versions of SIR and KSIR, which we refer to as sliced coordinate analysis (SCA) and kernel sliced coordinate analysis (KSCA), respectively. In the classification setting, we also call them discriminant coordinate analysis and kernel discriminant coordinate analysis. The computational complexities of SIR and KSIR rely on the dimensionality of the input vector and the number of input vectors, respectively, while those of SCA and KSCA both rely on the number of slices in the output. Thus, SCA and KSCA are very efficient dimension reduction methods.

  • PDF Download Icon
  • Research Article
  • 10.1186/s12859-024-05731-8
Multiple phenotype association tests based on sliced inverse regression
  • Apr 4, 2024
  • BMC Bioinformatics
  • Wenyuan Sun + 2 more

BackgroundJoint analysis of multiple phenotypes in studies of biological systems such as Genome-Wide Association Studies is critical to revealing the functional interactions between various traits and genetic variants, but growth of data in dimensionality has become a very challenging problem in the widespread use of joint analysis. To handle the excessiveness of variables, we consider the sliced inverse regression (SIR) method. Specifically, we propose a novel SIR-based association test that is robust and powerful in testing the association between multiple predictors and multiple outcomes.ResultsWe conduct simulation studies in both low- and high-dimensional settings with various numbers of Single-Nucleotide Polymorphisms and consider the correlation structure of traits. Simulation results show that the proposed method outperforms the existing methods. We also successfully apply our method to the genetic association study of ADNI dataset. Both the simulation studies and real data analysis show that the SIR-based association test is valid and achieves a higher efficiency compared with its competitors.ConclusionSeveral scenarios with low- and high-dimensional responses and genotypes are considered in this paper. Our SIR-based method controls the estimated type I error at the pre-specified level α\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$$\\alpha $$\\end{document}.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant