Estimation of Distribution Algorithms (EDA) have been proposed as an extension of genetic algorithms. In this paper we explain the relationship of EDA to algorithms developed in statistics, artificial intelligence, and statistical physics. The major design issues are discussed within a general interdisciplinary framework. It is shown that maximum entropy approximations play a crucial role. All proposed algorithms try to minimize the Kullback-Leibler divergence KLD between the unknown distribution p(x) and a class q(x) of approximations. However, the Kullback-Leibler divergence is not symmetric. Approximations which suppose that the function to be optimized is additively decomposed (ADF) minimize KLD(q||p), the methods which learn the approximate model from data minimize KLD(p||q). This minimization is identical to maximizing the log-likelihood. In the paper three classes of algorithms are discussed. FDA uses the ADF to compute an approximate factorization of the unknown distribution. The factors are marginal distributions, whose values are computed from samples. The second class is represented by the Bethe-Kikuchi approach which has recently been rediscovered in statistical physics. Here the values of the marginals are computed from a difficult constrained minimization problem. The third class learns the factorization from the data. We analyze our learning algorithm LFDA in detail. It is shown that learning is faced with two problems: first, to detect the important dependencies between the variables, and second, to create an acyclic Bayesian network of bounded clique size.
Read full abstract