Botnets have become one of the major intrusion threats to cybersecurity. P2P botnets have high concealment and resilience because of their distributed structure, which are difficult to be completely dismantled and destroyed. Existing methods based on traffic statistics and vulnerability tracing cannot effectively solve the problem of P2P botnet disintegration. Although P2P networks emphasize that each node is peer-to-peer, the difference in processing power, resource distribution, and node bandwidth can lead to a certain heterogeneity. The critical nodes bridge the underlying bot nodes and the upper control server. Traditional methods for ranking the importance of nodes mainly relies on classical graph-theoretic feature statistics method, such as degree, betweenness, clustering coefficient, feature vector centrality, PageRank, etc. In this paper, botnet defense strategies are investigated from the perspective of complex network graph theory, and graph embedding and deep reinforcement learning combination optimization methods are adopted to handle the critical nodes identification problem of P2P botnets. Then, a novel adaptive node removal model called PeerRemove is proposed. The model uses Structure2vec graph embedding to characterize the network structure information as a low-dimensional embedding space, and it uses n-step Q-learning to train the model to learn complex topological patterns to find the critical nodes that effectively disintegrate the network. To evaluate the effectiveness of the proposed method, the Area Under the Curve (AUC) of the Largest Connected Component (LCC) size during node removal is used as an evaluation indicator, and six different types real or synthetic P2P botnets are selected, namely Sality, ZeroAccess, NSIS, Mozi, Gnutella, and Peer sampling service. Experiments are conducted on many real and model networks with node sizes reaching thousands and tens of thousands, and our method is compared with five classical static or dynamic node attack methods of HAD, PageRank, CI, BPD, and HPRA. The experimental results show that the overall AUC curve of the PeerRemove method is lower than that of the benchmark method, which can minimize botnet resiliency at a small cost. The proposed method is superior to the existing node removal methods and shows good robustness and feasibility. To demonstrate the generality of this method, it is tested on a centralized topological dataset and good experimental results are obtained.