Abstract

We demonstrate that gaps and distributional patterns embedded within real-valued measurements are inseparable biological and mechanistic information contents of the system. Such patterns are discovered through data-driven possibly gapped histogram, which further leads to the geometry-based analysis of histogram (ANOHT). Constructing a possibly gapped histogram is a complex problem of statistical mechanics due to the ensemble of candidate histograms being captured by a two-layer Ising model. This construction is also a distinctive problem of Information Theory from the perspective of data compression via uniformity. By defining a Hamiltonian (or energy) as a sum of total coding lengths of boundaries and total decoding errors within bins, this issue of computing the minimum energy macroscopic states is surprisingly resolved by applying the hierarchical clustering algorithm. Thus, a possibly gapped histogram corresponds to a macro-state. And then the first phase of ANOHT is developed for simultaneous comparison of multiple treatments, while the second phase of ANOHT is developed based on classical empirical process theory for a tree-geometry that can check the authenticity of branches of the treatment tree. The well-known Iris data are used to illustrate our technical developments. Also, a large baseball pitching dataset and a heavily right-censored divorce data are analysed to showcase the existential gaps and utilities of ANOHT.

Highlights

  • Without spatial and temporal coordinates, a sample of onedimensional real-valued measurements is generally taken as one basic simple data type and receives very limited research attention

  • We demonstrate its potential merits through our developments of new data analysis paradigm, called analysis of histogram (ANOHT)

  • It is odd, but interesting to see that the bin located at 10 is nearly exclusively occupied by Dickey’s KN. This exclusiveness implies that the knuckleball-specific range of break-length is probably jointly achieved by the pitcher’s unusual pitching mechanics and the unusual low start-speed as seen in bins on the lower end of histogram of start-speed in figure 5a. This is another mechanistic pattern that can be possibly derived from ANOHT, but hardly could be derived from other methodologies

Read more

Summary

Introduction

Without spatial and temporal coordinates, a sample of onedimensional real-valued measurements is generally taken as one basic simple data type and receives very limited research attention. There are many possible patterns that can be exhibited through the piecewise step-function structure 2 of an empirical distribution. Two of them take the most basic forms: one is ‘linear segment’ and the other is ‘gap’. As for the rest of the potential patterns, they can be very well approximated by properly combining these two basic patterns. An empirical distribution ideally can be well approximated by various serial compositions of basic patterns. Each composition is a possibly gapped piecewise linear approximation, which is correspondingly equivalent to a possibly gapped histogram

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call