Abstract

Diverse discrete systems share common global properties that lack a unifying theoretical explanation. However, constraining the simplest measure of total information (Hartley–Shannon) in a statistical mechanics framework reveals a principle, the conservation of Hartley–Shannon information (CoHSI) that directly predicts both known and unsuspected common properties of discrete systems, as borne out in the diverse systems of computer software, proteins and music. Discrete systems fall into two categories distinguished by their structure: heterogeneous systems in which there is a distinguishable order of assembly of the system’s components from an alphabet of unique tokens (e.g. proteins assembled from an alphabet of amino acids), and homogeneous systems in which unique tokens are simply binned, counted and rank ordered. Heterogeneous systems are characterized by an implicit distribution of component lengths, with sharp unimodal peak (containing the majority of components) and a power-law tail, whereas homogeneous systems reduce naturally to Zipf’s Law but with a drooping tail in the distribution. We also confirm predictions that very long components are inevitable for heterogeneous systems; that discrete systems can exhibit simultaneously both heterogeneous and homogeneous behaviour; and that in systems with more than one consistent token alphabet (e.g. digital music), the alphabets themselves show a power-law relationship.

Highlights

  • Discrete systems, i.e. systems that comprise pieces that can be consistently counted, are everywhere in the inanimate world, the biological world (e.g. DNA, proteins, species) and the world of human endeavour and creativity

  • We show that the single differential equation that we derive, which embodies the principle of conservation of Hartley–Shannon information or CoHSI, accurately predicts the global properties of discrete systems as diverse as proteins, computer software and digital music

  • — A mechanism- and token-agnostic scale-independent theory embracing statistical mechanics in which the simplest possible measure of Hartley–Shannon information is embedded as a constraint (CoHSI) is capable of explaining this underlying similarity with only the standard assumptions of statistical mechanics; that all microstates are probable; and that components are reasonably well populated so that Stirling’s approximation is satisfactory

Read more

Summary

Introduction

I.e. systems that comprise pieces that can be consistently counted, are everywhere in the inanimate world (e.g. matter itself ), the biological world (e.g. DNA, proteins, species) and the world of human endeavour and creativity (e.g. computer software, written texts, digital music). A theory of discrete systems should satisfy, at a minimum, the following criteria: (i) it must explain and predict the observed properties of discrete systems that extend beyond simple power-law (Zipfian) relationships, (ii) it must be agnostic with respect to the types of pieces (tokens) of which discrete systems are composed, (iii) it must be agnostic with respect to mechanism, and (iv) it must be scale-independent. We show that the single differential equation that we derive, which embodies the principle of conservation of Hartley–Shannon information or CoHSI, accurately predicts the global properties of discrete systems (both heterogeneous and homogeneous) as diverse as proteins, computer software and digital music.

Heterogeneous discrete systems
Hartley–Shannon information and statistical mechanics
Why conserve information?
Heterogeneous CoHSI equation
The homogeneous CoHSI distribution
Solution of the heterogeneous CoHSI equation
Solution of the homogeneous CoHSI equation
Testing the predictions of CoHSI
15 Ramanujan
Justifying the presence of power-law tails
Simultaneous heterogeneous and homogeneous behaviour
Categorization and the uniqueness of alphabets
Scale-independence and the longest component
Conclusion
Consolidated arXiv work
Proteins
Computer software
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call