As a promising approach to managing distributed energy, the use of microgrids has attracted significant attention among those managing continuous connections to distribution networks. However, the barriers of the data sharing among different microgrids, the uncertainty of the distributed renewable sources and loads, and the nonlinear optimization of power flow make traditional model-based optimization methods difficult to be applied. In this paper, a data-driven coordinated active and reactive power optimization method is proposed for distribution networks with multi-microgrids. A multi-agent deep reinforcement learning (MADRL) method is used to protect the data privacy of each microgrids. Moreover, attention mechanism, which pays attention to crucial information, is presented to overcome the problem of slow convergence caused by the dimensionality explosion of the optimized variables. Two types of agents, controlling discrete action and continuous action devices, respectively, are formulated in coordinated optimization, which reduces voltage violations and improves the system operation efficiency. In addition, in order to improve the performance of the online agent model under variable operation conditions, the transfer learning is embedded in the training process of the MADRL. The proposed method is verified on a modified IEEE 33-bus distribution network with nine microgrids.