In this paper, a novel peer-to-peer (P2P) power trading for power management of nanogrid clusters with renewable energy sources is presented. Unlike power management of smart grids in megawatt power scale, power management of nanogrids in kilowatt power scale requires individual control of electric appliances. P2P trading is characterized by complex elements such as decentralized architecture, participant behaviors, and distributed energy resources. To this end, multi-objective optimization providing Pareto-optimal solutions is applied to the power management of nanogrid clusters capable of P2P trading. The P2P power trading aims at maximizing the profit as well as minimizing the grid power consumption. However, due to the conventional rule for P2P trading and trade-off property of Pareto-optimal solution, the result of P2P trading is not optimal. To improve the performance of P2P trading in terms of total electricity cost, reinforcement learning (RL) algorithms are introduced. For RL, deep neural networks are applied to P2P power trading. The RL agent of each nanogrid cluster learns the behavior of P2P trading and then fulfills P2P trading. In the proposed RL technique, graph convolutional network is employed for analyzing graph-structured data, and bidirectional long short-term memory network is utilized for data prediction, enhancing the performance of P2P trading. It is found from simulations of power management for nanogrid clusters that the electricity cost with the proposed RL technique is the lowest among the RL algorithms considered for benchmark test. The electricity cost is reduced by 36.7%, averaged over ten nanogrid clusters, as compared to the electricity cost obtained from power management without RL algorithms.