To address the challenges of low accuracy and limited generalization in long-tailed fault diagnosis, an adaptive data distribution-based reinforcement learning General Agent is proposed. The method primarily targets more discriminative, domain-invariant feature learning by pre-training the deep Q-network with unlabeled positive samples. Supervisory signals derived from the data’s intrinsic structure enhance class boundary detection while maximizing intra-class feature similarity. Next, empirical data prioritization based on state-action values and TD-error enables efficient utilization of rare but critical experiences, significantly improving sampling efficiency. Concurrently, an adaptive distribution strategy refines a hierarchical reward system by dynamically calibrating the reward function according to real-time accuracy feedback. The deep Q-network, structured with ResNet as the backbone, integrates Efficient Channel Attention (ECA) and Global Attention Mechanism (GAM) to enhance decision-making robustness. Tested on a long-tailed shipboard antenna dataset, the proposed method autonomously identifies fault patterns, demonstrating clear advantages in efficiency, robustness, generalization, and interpretability.