Sufficient Dimension Reduction: An Information-Theoretic Viewpoint.

Debashis Ghosh

doi:10.3390/e24020167

Abstract

There has been a lot of interest in sufficient dimension reduction (SDR) methodologies, as well as nonlinear extensions in the statistics literature. The SDR methodology has previously been motivated by several considerations: (a) finding data-driven subspaces that capture the essential facets of regression relationships; (b) analyzing data in a ‘model-free’ manner. In this article, we develop an approach to interpreting SDR techniques using information theory. Such a framework leads to a more assumption-lean understanding of what SDR methods do and also allows for some connections to results in the information theory literature.

Highlights

There has been a field of statistics, termed sufficient dimension reduction (SDR), that has sought to develop a methodology with this goal in mind
We propose a new interpretation for sufficient dimension reduction based on conditional independence assumptions
We can avoid the goal of SDR as estimating a parameter, namely the basis of the central subspace, and view it instead as a means for information compression while simultaneously preserving association with an outcome variable

Summary

Introduction

A key challenge is to determine appropriate transformations of the data that can reduce its dimension while at the same time capturing the essential information in the regression relationship between a set of covariates and a response variable To this end, there has been a field of statistics, termed sufficient dimension reduction (SDR), that has sought to develop a methodology with this goal in mind. By moving to mutual information, we can relax some of the distributional assumptions needed for sufficient dimension reduction in a manner different from that in [12,13,14,15,16] This direction is a departure from the viewpoint that SDR serves as a means to estimate a target parameter, typically the span of basis vectors of the central subspace.

Data Structures and Review of Dimension Reduction Methods

Limitations of Sufficient Dimension Reduction

The Case of Gaussian Variables

Numerical Illustration

Discussion