On Cost-Efficient Learning of Data Dependency

Hyeryung Jang,Hyungseok Song,Yung Yi

doi:10.1109/tnet.2022.3141128

Abstract

In this paper, we consider the problem of learning a tree graph structure that represents the statistical data dependency among nodes for a set of data samples generated by nodes, which provides the basic structure to perform a probabilistic inference task. Inference in the data graph includes marginal inference and maximum a posteriori (MAP) estimation, and belief propagation (BP) is a commonly used algorithm to compute the marginal distribution of nodes via message-passing, incurring non-negligible amount of communication cost. We inevitably have the trade-off between the inference accuracy and the message-passing cost because the learned structure of data dependency and physical connectivity graph are often highly different. In this paper, we formalize this trade-off in an optimization problem which outputs the data dependency graph that jointly considers learning accuracy and message-passing costs. We focus on two popular implementations of BP, <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ASYNC-BP</monospace> and <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">SYNC-BP</monospace> , which have different message-passing mechanisms and cost structures. In <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ASYNC-BP</monospace> , we propose a polynomial-time learning algorithm that is optimal, motivated by finding a maximum weight spanning tree of a complete graph. In <monospace xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">SYNC-BP</monospace> , we prove the NP-hardness of the problem and propose a greedy heuristic. For both BP implementations, we quantify how the error probability that the learned cost-efficient data graph differs from the ideal one decays as the number of data samples grows, using the large deviation principle, which provides a guideline on how many samples are necessary to obtain a certain trade-off. We validate our theoretical findings through extensive simulations, which confirms that it has a good match.

Full Text