Large collections of coupled, heterogeneous agents can manifest complex dynamical behavior presenting difficulties for simulation and analysis. However, if the collective dynamics lie on a low-dimensional manifold, then the original agent-based model may be approximated with a simplified surrogate model on and near the low-dimensional space where the dynamics live. Analytically identifying such simplified models can be challenging or impossible, but here we present a data-driven coarse-graining methodology for discovering such reduced models. We consider two types of reduced models: globally based models that use global information and predict dynamics using information from the whole ensemble and locally based models that use local information, that is, information from just a subset of agents close (close in heterogeneity space, not physical space) to an agent, to predict the dynamics of an agent. For both approaches, we are able to learn laws governing the behavior of the reduced system on the low-dimensional manifold directly from time series of states from the agent-based system. These laws take the form of either a system of ordinary differential equations (ODEs), for the globally based approach, or a partial differential equation (PDE) in the locally based case. For each technique, we employ a specialized artificial neural network integrator that has been templated on an Euler time stepper (i.e., a ResNet) to learn the laws of the reduced model. As part of our methodology, we utilize the proper orthogonal decomposition (POD) to identify the low-dimensional space of the dynamics. Our globally based technique uses the resulting POD basis to define a set of coordinates for the agent states in this space and then seeks to learn the time evolution of these coordinates as a system of ODEs. For the locally based technique, we propose a methodology for learning a partial differential equation representation of the agents; the PDE law depends on the state variables and partial derivatives of the state variables with respect to model heterogeneities. We require that the state variables are smooth with respect to model heterogeneities, which permit us to cast the discrete agent-based problem as a continuous one in heterogeneity space. The agents in such a representation bear similarity to the discretization points used in typical finite element/volume methods. As an illustration of the efficacy of our techniques, we consider a simplified coupled neuron model for rhythmic oscillations in the pre-Bötzinger complex and demonstrate how our data-driven surrogate models are able to produce dynamics comparable to the dynamics of the full system. A nontrivial conclusion is that the dynamics can be equally well reproduced by an all-to-all coupled and by a locally coupled model of the same agents.