Computation of Kullback-Leibler Divergence in Bayesian Networks.

Serafín Moral,Manuel Gómez-Olmedo,Andrés Cano

doi:10.3390/e23091122

Abstract

Kullback–Leibler divergence is the standard measure of error when we have a true probability distribution p which is approximate with probability distribution q. Its efficient computation is essential in many tasks, as in approximate computation or as a measure of error when learning a probability. In high dimensional probabilities, as the ones associated with Bayesian networks, a direct computation can be unfeasible. This paper considers the case of efficiently computing the Kullback–Leibler divergence of two probability distributions, each one of them coming from a different Bayesian network, which might have different structures. The paper is based on an auxiliary deletion algorithm to compute the necessary marginal distributions, but using a cache of operations with potentials in order to reuse past computations whenever they are necessary. The algorithms are tested with Bayesian networks from the bnlearn repository. Computer code in Python is provided taking as basis pgmpy, a library for working with probabilistic graphical models.

Highlights

When experimentally testing Bayesian network learning algorithms, in most of the cases, the performance is evaluated looking at structural differences between the graphs of the original Bayesian network and the learned one [1], as in the case of using the structural Hamming distance
The aim of this paper is to compute the Kullback–Leibler divergence between the joint probability distributions, pA and pB, of two different Bayesian networks NA and NB defined on the same set of variables X but possibly having different structures
Any other different learning algorithm could have been used, since the goal is to have an alternative Bayesian network that will be used later to calculate the Kullback–Leibler divergence with the methods described in Algorithms 1 and 2

Summary

Introduction

When experimentally testing Bayesian network learning algorithms, in most of the cases, the performance is evaluated looking at structural differences between the graphs of the original Bayesian network and the learned one [1], as in the case of using the structural Hamming distance. It can be useful to estimate a network that is less dense than the original one, but in which parameters can have a more accurate estimation This is the case of the Naive Bayes classifier, which obtains very good results in classification problems, despite the fact that the structure is not the correct one. In this situation, structural graphical differences are not a good measure of performance

Objectives

Methods

Conclusion