This paper is the result of an attempt to clarify and improve some results in the theory of statistical information. The term information is used to denote different things in different contexts. First of all, there is Shannon's information, $-\sum p_i \log p_i$, defined for probability distributions on a finite sample space; this measures, in an esthetically satisfactory way, the entropy or amount of uncertainty in a distribution. Then there is Wiener's information, $\int f(x) \log f(x) dx$, defined for an absolutely continuous distribution on the line (or in $n$-space); it was introduced by Wiener, with an acknowledgment to von Neumann, as a "reasonable measure" of the amount of information, having the property of being "the negative of the quantity usually defined as entropy in similar situations" ([10], p. 76). Finally, there is "information of one probability distribution $P$ with respect to another $Q$," commonly known as Kullback-Leibler information. On a finite sample space, this has the form $\sum p_i \log (p_i/q_i) = - \sum p_i \log q_i - (- \sum p_i \log p_i)$, and thus has some relationship to entropy; note that the second term, which is the entropy of $\{p_i\}$, is the minimum of the first expression over all distributions $\{q_i\}$. An interesting idea due to Gelfand, Kolmogorov and Yaglom [3] establishes a connection between the Kullback-Leibler information for a finite probability space and that for any space: If $P, Q$ are probability measures on a measurable space $(\Omega, \mathscr{F}), P \ll Q$, and $\{A_i, i = 1, \cdots, n\}$ is any finite measurable partition of $\Omega$, then the supremum of $\sum_i \log \lbrack P(A_i)/Q(A_i)\rbrack P(A_i)$ over all finite measurable partitions is $\int_\Omega \log (dP/dQ)dP$. The only published proof of this result seems to be that due to Kallianpur [5], which uses martingale theory. In Section 1, we shall obtain a rather simple direct proof of this result (Theorem 1.1) and extend it to the case where $Q$ is any $\sigma$-finite measure (Theorem 1.2). Wiener's information is then seen to be the supremum of $\sum \log \lbrack P(A_i)/Q(A_i)\rbrack P(A_i)$ over countable partitions, with $Q =$ Lebesgue measure. Section 2 will be concerned with Kullback-Leibler information. We shall define conditional information relative to a sub-field, establish a relation between this conditional information and sufficiency of the sub-field (Theorem 2.2), and also show that this conditional information equals the difference between information contained in the field and that in the sub-field (Theorem 2.3). These are extensions of results obtained by Kullback and Leibler in a somewhat limited context.
Read full abstract