Complex Distributions Emerging in Filtering and Compression

G J Baxter,J F F Mendes,R A Da Costa,S N Dorogovtsev

doi:10.1103/physrevx.10.011074

G J Baxter, J F F Mendes + Show 2 more

Open Access

https://doi.org/10.1103/physrevx.10.011074

Copy DOI

Journal: Physical Review X	Publication Date: Mar 30, 2020
Citations: 1	License type: CC BY 4.0

Affiliation: University of Aveiro

Abstract

In filtering, each output is produced by a certain number of different inputs. We explore the statistics of this degeneracy in an explicitly treatable filtering problem in which filtering performs the maximal compression of relevant information contained in inputs (arrays of zeroes and ones). This problem serves as a reference model for the statistics of filtering and related sampling problems. The filter patterns in this problem conveniently allow a microscopic, combinatorial consideration. This allows us to find the statistics of outputs, namely the exact distribution of output degeneracies, for arbitrary input sizes. We observe that the resulting degeneracy distribution of outputs decays as $e^{-c\log^\alpha \!d}$ with degeneracy $d$, where $c$ is a constant and exponent $\alpha>1$, i.e. faster than a power law. Importantly, its form essentially depends on the size of the input data set, appearing to be closer to a power-law dependence for small data set sizes than for large ones. We demonstrate that for sufficiently small input data set sizes typical for empirical studies, this distribution could be easily perceived as a power law. We extend our results to filter patterns of various sizes and demonstrate that the shortest filter pattern provides the maximum informative representations of the inputs.

Highlights

Compression, filtering, and cryptography are related areas in signal and information processing [1,2,3,4]
For complete input datasets passed through our filter, we have obtained degeneracy distributions markedly distinct from power laws
These distributions decay as N cumðdÞ ∝ e−c lnα d, α > 1, much slower than exponentially, and in this sense they can still be called “critical.” We have observed that the entire form of these output distributions essentially depends on the input size n, which strongly differs, for example, from heavy tailed degree distributions of complex networks having exponential cutoffs [31,32]

Summary

INTRODUCTION

Compression, filtering, and cryptography are related areas in signal and information processing [1,2,3,4]. [9,10,11,12,13] is that maximally informative samples drawn from data exhibit statistics with broad distributions Their entropy optimization based theory predicts power-law-like distributions of degeneracy of maximally informative outputs (minimal sufficient representations). Explore a reference filtering problem straightforwardly treatable through purely combinatorial techniques This filter extracts all positions of a given local pattern in the input; see Fig. 1. We study a family of such filter patterns and demonstrate that the smallest pattern, extracting single ones from the inputs, generates outputs with the highest entropy of the degeneracy distribution, called relevance; see Fig. 2 and Table I.

REFERENCE FILTERING PROBLEM

OUTPUTS AND THEIR DEGENERACIES FOR COMPLETE INPUT DATASET

CALCULATING THE EXACT DEGENERACY SPECTRUM

OUTPUT DEGENERACY DISTRIBUTION FOR COMPLETE INPUT DATASETS

MEAN-FIELD THEORY

VIII. DEGENERACY DISTRIBUTIONS OF OUTPUTS FOR RANDOMLY GENERATED

DISCUSSION AND CONCLUSIONS