Power buffers are power electronic converters, with large capacitors, that decouple volatile loads and a low-inertia distribution network in a DC microgrid. In this work, a set of distributed optimal control policies enable power buffers to reciprocally assist each other during abrupt load changes. While the majority of existing control paradigms are localized, enabling communication among buffers extends their effective range of assistance and helps them minimize a shared objective in a cooperative fashion. The control law's weights surfaces are learned for a mesh of reference loads of each power buffer. Hamilton-Jacobi-Bellman equation is solved by a continuous-time adaptive dynamic programming (ADP) approach with off-policy learning to directly provide a feedback controller, instead of existing approaches that obtain open-loop policies via Pontryagin's minimum principle. This paper presents the first attempt in using ADP techniques for the control of power buffers that respects their original nonlinear dynamics, overcoming the limitations of previous approaches based on small-signal analysis. Compared to the current literature, the proposed approach provides trained controllers that are known a priori, avoiding player-by-player solutions or real-time optimization procedures that could degrade performances or become computationally intensive. Hardware-in-the-loop emulations of a low-voltage DC microgrid validates the proposed approach.