The electric solar wind sail (E-sail) is a new propellant-free propulsion concept. The under-actuated and highly nonlinear features of E-sail systems pose a great challenge to their attitude controller design. Conventional control schemes may not be capable of dealing with this tough problem. To this end, a reinforcement learning (RL)-based control scheme, which can explore and obtain optimal policies in the absence of training datasets, is proposed for the attitude control of a barbell E-sail system. The barbell E-sail comprises two end satellites linked to an insulated confluence point through long and conductive tethers. The voltages of the two tethers can be individually modulated for attitude control. The system attitude dynamics is described using a nonsingular formulation. The control scheme has a two-stage design. In the first stage, an RL controller based on the Proximal Policy Optimization (PPO) algorithm is used to obtain an RL control strategy, which is emulated and updated by neural networks. In the second stage, the attitude feedback control is accomplished with low computation and energy consumption and fast convergence speed by performing a real-time mapping from the system state to the control output using the updated control strategy. Finally, the simulation results demonstrate that the proposed RL-based control scheme can effectively adjust the E-sail to the design attitude by regulating the tether voltage difference. The comparisons with the NMPC scheme also indicate that the developed control scheme can significantly reduce the computation time with control accuracy maintained.