Pareto-Optimal Data Compression for Binary Classification Tasks.

Max Tegmark,Tailin Wu

doi:10.3390/e22010007

Abstract

The goal of lossy data compression is to reduce the storage cost of a data set X while retaining as much information as possible about something (Y) that you care about. For example, what aspects of an image X contain the most information about whether it depicts a cat? Mathematically, this corresponds to finding a mapping that maximizes the mutual information while the entropy is kept below some fixed threshold. We present a new method for mapping out the Pareto frontier for classification tasks, reflecting the tradeoff between retained entropy and class information. We first show how a random variable X (an image, say) drawn from a class can be distilled into a vector losslessly, so that ; for example, for a binary classification task of cats and dogs, each image X is mapped into a single real number W retaining all information that helps distinguish cats from dogs. For the case of binary classification, we then show how W can be further compressed into a discrete variable by binning W into bins, in such a way that varying the parameter sweeps out the full Pareto frontier, solving a generalization of the discrete information bottleneck (DIB) problem. We argue that the most interesting points on this frontier are “corners” maximizing for a fixed number of bins which can conveniently be found without multiobjective optimization. We apply this method to the CIFAR-10, MNIST and Fashion-MNIST datasets, illustrating how it can be interpreted as an information-theoretically optimal image clustering algorithm. We find that these Pareto frontiers are not concave, and that recently reported DIB phase transitions correspond to transitions between these corners, changing the number of clusters.

Highlights

A core challenge in science, and in life quite generally, is data distillation: Keeping only a manageably small fraction of our available data X while retaining as much information as possible about something (Y) that we care about
For the n = 2 case of binary classification, we show how W can be further compressed into a discrete variable Z = g β (W ) ∈ {1, ..., m β } by binning W into m β bins, in such a way that varying the parameter β sweeps out the full Pareto frontier, solving a generalization of the discrete information bottleneck (DIB) problem
Fashion-MNIST datasets, illustrating how it can be interpreted as an information-theoretically optimal image clustering algorithm. We find that these Pareto frontiers are not concave, and that recently reported DIB phase transitions correspond to transitions between these corners, changing the number of clusters

Summary

Introduction

A core challenge in science, and in life quite generally, is data distillation: Keeping only a manageably small fraction of our available data X while retaining as much information as possible about something (Y) that we care about. H∗ = H ( Z ) (bits stored) and I∗ = I ( Z, Y ) (useful bits) is described by a Pareto frontier, defined as I∗ ( H∗ ) ≡ sup. The colored dots correspond to random likelihood binnings into various numbers of bins, as described, and the upper envelope of all attainable points define the Pareto frontier. Its “corners”, which are marked by black dots and maximize I ( Z, Y ) for M bins (M = 1, 2, ...), are seen to lie close to the vertical dashed lines H ( Z ) = log M, corresponding to all bins having equal size. The core goal of this paper is to present a method for computing such Pareto frontiers

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Entropy (Basel, Switzerland)	Publication Date: Dec 19, 2019
Citations: 11	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Pareto-Optimal Data Compression for Binary Classification Tasks.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy (Basel, Switzerland)

Lead the way for us

Similar Papers

A Multi-direction Prediction Multi-objective Hybrid Chemical Reaction Optimization Algorithm for Dynamic Multi-objective Optimization
Hongye Li ... Xiaoying Pan
-
Hongye Li, et. al.Hongye Li ... Xiaoying Pan
01 Jan 2021
01 Jan 2021

A Post-Optimality Analysis Algorithm for Multi-Objective Optimization
Vandana Venkat ... Sheldon H. Jacobson
Computational Optimization and Applications | VOL. 28
Vandana Venkat, et. al.Vandana Venkat ... Sheldon H. Jacobson
01 Sep 2004
Computational Optimization and Applications | VOL. 28

Multi-Objective Optimization of a Transonic Compressor Rotor by Using an Adjoint Method
Jiaqi Luo ... Feng Liu
AIAA Journal | VOL. 53
Jiaqi Luo, et. al.Jiaqi Luo ... Feng Liu
19 Sep 2014
AIAA Journal | VOL. 53

Adaptive Formation of Pareto Front in Evolutionary Multi-objective Optimization
Ozer Ciftcioglu ... Michael S.
-
Ozer Ciftcioglu, et. al.Ozer Ciftcioglu ... Michael S.
01 Oct 2009
01 Oct 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pareto-Optimal Data Compression for Binary Classification Tasks.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy (Basel, Switzerland)