Abstract

In filtering, each output is produced by a certain number of different inputs. We explore the statistics of this degeneracy in an explicitly treatable filtering problem in which filtering performs the maximal compression of relevant information contained in inputs (arrays of zeroes and ones). This problem serves as a reference model for the statistics of filtering and related sampling problems. The filter patterns in this problem conveniently allow a microscopic, combinatorial consideration. This allows us to find the statistics of outputs, namely the exact distribution of output degeneracies, for arbitrary input sizes. We observe that the resulting degeneracy distribution of outputs decays as $e^{-c\log^\alpha \!d}$ with degeneracy $d$, where $c$ is a constant and exponent $\alpha>1$, i.e. faster than a power law. Importantly, its form essentially depends on the size of the input data set, appearing to be closer to a power-law dependence for small data set sizes than for large ones. We demonstrate that for sufficiently small input data set sizes typical for empirical studies, this distribution could be easily perceived as a power law. We extend our results to filter patterns of various sizes and demonstrate that the shortest filter pattern provides the maximum informative representations of the inputs.

Highlights

  • Compression, filtering, and cryptography are related areas in signal and information processing [1,2,3,4]

  • For complete input datasets passed through our filter, we have obtained degeneracy distributions markedly distinct from power laws

  • These distributions decay as N cumðdÞ ∝ e−c lnα d, α > 1, much slower than exponentially, and in this sense they can still be called “critical.” We have observed that the entire form of these output distributions essentially depends on the input size n, which strongly differs, for example, from heavy tailed degree distributions of complex networks having exponential cutoffs [31,32]

Read more

Summary

INTRODUCTION

Compression, filtering, and cryptography are related areas in signal and information processing [1,2,3,4]. [9,10,11,12,13] is that maximally informative samples drawn from data exhibit statistics with broad distributions Their entropy optimization based theory predicts power-law-like distributions of degeneracy of maximally informative outputs (minimal sufficient representations). Explore a reference filtering problem straightforwardly treatable through purely combinatorial techniques This filter extracts all positions of a given local pattern in the input; see Fig. 1. We study a family of such filter patterns and demonstrate that the smallest pattern, extracting single ones from the inputs, generates outputs with the highest entropy of the degeneracy distribution, called relevance; see Fig. 2 and Table I.

REFERENCE FILTERING PROBLEM
OUTPUTS AND THEIR DEGENERACIES FOR COMPLETE INPUT DATASET
CALCULATING THE EXACT DEGENERACY SPECTRUM
OUTPUT DEGENERACY DISTRIBUTION FOR COMPLETE INPUT DATASETS
MEAN-FIELD THEORY
VIII. DEGENERACY DISTRIBUTIONS OF OUTPUTS FOR RANDOMLY GENERATED
DISCUSSION AND CONCLUSIONS
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call