AbstractTo address the concerns of energy supply and spectrum scarcity for wireless devices, energy harvesting cognitive radio networks have been proposed. To improve spectrum utilization, secondary users (SUs) access the licensed spectrum in underlay mode, which may cause severe interference to primary users and SUs. The focus is on the underlay energy harvesting cognitive radio networks with multiple pairs of SUs, and formulate the long‐term secondary throughput maximization problem as a mixed‐integer non‐linear programming problem. As traditional approaches could hardly solve the mixed‐integer non‐linear programming problem well, a centralized deep deterministic policy gradient (C‐DDPG) approach is proposed that achieves satisfactory throughput performance. To reduce the computational complexity of C‐DDPG, we further propose a clustering‐based multi‐agent DDPG (CMA‐DDPG) approach that combines the advantages of the centralized deep reinforcement learning approach and the distributed deep reinforcement learning approach. In the CMA‐DDPG, a novel interference‐based clustering algorithm is proposed, which partitions the SUs that cause severe mutual interference into one cluster, and the sizes of state space and action space are smaller than those in C‐DDPG. Numerical results validate the superiority of the proposed approaches in terms of the throughput and outage probability, and validate the clustering performance of the interference‐based clustering algorithm in terms of the outage probability of the secondary network.