Active pantograph control is the most promising technique for reducing contact force (CF) fluctuation and improving the train's current collection quality. Existing solutions, however, suffer from two significant limitations: 1) they are incapable of dealing with the various pantograph types, catenary line operating conditions, changing operating speeds, and contingencies well and 2) it is challenging to implement in practical systems due to the lack of rapid adaptability to a new pantograph-catenary system (PCS) operating conditions and environmental disturbances. In this work, we alleviate these problems by developing a revolutionary context-based deep meta-reinforcement learning (CB-DMRL) algorithm. The proposed CB-DMRL algorithm combines Bayesian optimization (BO) with deep reinforcement learning (DRL), allowing the general agent to adapt to new tasks quickly and efficiently. We evaluated the CB-DMRL algorithm's performance on a proven PCS model. The experimental results demonstrate that meta-training DRL policies with latent space swiftly adapt to new operating conditions and unknown perturbations. The meta-agent adapts quickly after two iterations with a high reward, which require only ten spans, approximately equal to 0.5 km of PCS interaction data. Compared with state-of-the-art DRL algorithms and traditional solutions, the proposed method can promptly traverse scenario changes and reduce CF fluctuations, resulting in an excellent performance.
Read full abstract