Ensembles of networks arise in many scientific fields, but there are relatively few statistical tools for inferring their generative processes, particularly in the presence of both dyadic dependence and cross-graph heterogeneity. To address this gap, we propose characterizing network ensembles via finite mixtures of exponential family random graph models (ERGMs), a class of parametric statistical models that has been successful in explicitly modeling the complex stochastic processes that govern the structure of edges in a network. Our proposed modeling framework can also be used for applications such as model-based clustering of ensembles of networks and density estimation for complex graph distributions. We develop a joint approach to estimate the number of mixture components and identify cluster-specific parameters simultaneously as well as to obtain an identified model under the Bayesian paradigm. Specifically, we develop a Metropolis-within-Gibbs algorithm to perform Bayesian inference, and estimate the number of mixture components using a strategy of deliberate overfitting with sparse priors that removes excess components during MCMC. As the true ERGM likelihood is generally intractable for model specifications with dyadic dependence terms, we consider two tractable approximations (pseudolikelihood and adjusted pseudolikelihood) to facilitate efficient statistical inference. We run simulation studies to compare the performance of these two approximations with respect to multiple metrics, showing conditions under which both are useful. We demonstrate the utility of the proposed approach using an ensemble of political co-voting networks among U.S. Senators and an ensemble of brain functional connectivity networks.