AbstractThis paper addresses and acknowledges the valuable feedback provided by Dr. Deniz Preil in response to the recent study conducted by Kurian et al which investigates the application of proximal policy optimization (PPO) to determine dynamic ordering policies within multi‐echelon supply chains. The first comment raised by Dr. Preil motivated an examination of the training and evaluation procedures in Experiments 2, 3, and 4. The Experiments 2 and 3 were reworked to address this, allowing the seed to vary for every training iteration, resulting in refined outcomes while there was no need of reworking of Experiment 4. The second comment focused on the benchmarking strategies involving the 1‐1 policy and the order‐up‐to (OUT) policy, clarifying the distinctions between the two policies and justifying the use of the 1‐1 policy for benchmarking in Experiment 4. The implementation of the widely accepted OUT policy was explained, highlighting the meaningful rationale behind its use. These discussions aim to enhance the methodology employed by Kurian et al and strengthen the implications of the findings within the domain of supply chain ordering management.
Read full abstract