This article proposes model-free reinforcement learning methods for minimum-cost state-flipped control in Boolean control networks (BCNs). We tackle two questions: 1) finding the flipping kernel, namely, the flip set with the smallest cardinality ensuring reachability and 2) deriving optimal policies to minimize the number of flipping actions for reachability based on the obtained flipping kernel. For Question 1), Q-learning's capability in determining reachability is demonstrated. To expedite convergence, we incorporate two improvements: 1) demonstrating that previously reachable states remain reachable after adding elements to the flip set, followed by employing transfer learning and 2) initiating each episode with special initial states whose reachability to the target state set are currently unknown. For Question 2), it is challenging to encapsulate the objective of simultaneously reducing control costs and satisfying terminal constraints exclusively through the reward function employed in the Q-learning framework. To bridge the gap, we propose a BCN-characteristics-based reward scheme and prove its optimality. Questions 1) and 2) with large-scale BCNs are addressed by employing small memory Q-learning, which reduces memory usage by only recording visited action-values. An upper bound on memory usage is provided to assess the algorithm's feasibility. To expedite convergence for Question 2) in large-scale BCNs, we introduce adaptive variable rewards based on the known maximum steps needed to reach the target state set without cycles. Finally, the effectiveness of the proposed methods is validated on both small-and large-scale BCNs.
Read full abstract