In this article, we study the problem of cooperative inference, where a group of agents interacts over a network and seeks to estimate a joint parameter that best explains a set of network-wide observations using local information only. Agents do not know the network topology or the observations of other agents. We explore a variational interpretation of the Bayesian posterior and its relation to the stochastic mirror descent algorithm to prove that, under appropriate assumptions, the beliefs generated by the proposed algorithm concentrate around the true parameter exponentially fast. Part I of this two-part article series focuses on providing a variation approach to distributed Bayesian filtering. Moreover, we develop explicit and computationally efficient algorithms for observation models in the exponential families. In addition, we provide a novel nonasymptotic belief concentration analysis for distributed non-Bayesian learning on finite hypothesis sets. This new analysis method is the basis for the results presented in Part II. Part II provides the first nonasymptotic belief concentration rate analysis for distributed non-Bayesian learning over networks on compact hypothesis sets. In addition, we provide extensive numerical analysis for various distributed inference tasks on networks for observational models in the exponential family of distributions.