This article introduces a deep contextual reinforcement learning (DCRL) optimization algorithm to tackle the NP-hard power scheduling problem. The algorithm is based on a multi-agent simulation environment that decomposes the power scheduling problem into sequential Markov decision processes (MDPs). The simulation environment inherently corrects incorrect decisions and adjusts supply capacities so that the MDPs adhere to the optimization constraints. A deep reinforcement learning (DRL) model is trained on these MDPs to provide optimal solutions. Demonstrating its applicability and effectiveness, the proposed method is evaluated on various test systems with different unit numbers, constraints, and production cost functions. The experimental results show that the proposed method has better performance relative to alternative methods, such as binary alternative moth-flame optimization (BAMFO), binary particle swarm optimization (BPSO), teaching learning-based optimization (TLBO), new binary particle swarm optimization (NBPSO), binary learning particle swarm optimization (BLPSO), quasi-oppositional teaching learning-based algorithm (QOTLBO), binary-real-coded genetic algorithm (BRCGA), hybrid genetic-imperialist competitive algorithm (HGICA), three-stage priority list (TSPL), and hybridized evolutionary algorithm and sequential quadratic programming (HEPSQP). The novelties of the proposed method lie in its adaptability to longer planning horizons and linear dimensionality complexity, rendering it suitable for large-scale power systems.
Read full abstract