Abstract
When a disaster strikes, it is important to allocate limited disaster relief resources to those in need. This paper considers the allocation of resources in humanitarian logistics using three critical performance indicators: efficiency, effectiveness and equity. Three separate costs are considered to represent these metrics, namely, the accessibility-based delivery cost, the starting state-based deprivation cost, and the terminal penalty cost. A mixed-integer nonlinear programming model with multiple objectives and multiple periods is proposed. A Q-learning algorithm, a type of reinforcement learning method, is developed to address the complex optimization problem. The principles of the proposed algorithm, including the learning agent and its actions, the environment and its states, and reward functions, are presented in detail. The parameter settings of the proposed algorithm are also discussed in the experimental section. In addition, the solution quality of the proposed algorithm is compared with that of the exact dynamic programming method and a heuristic algorithm. The experimental results show that the efficiency of the algorithm is better than that of the dynamic programming method and the accuracy of the algorithm is higher than that of the heuristic algorithm. Moreover, the Q-learning algorithm provides close to or even optimal solutions to the resource allocation problem by adjusting the value of the training episode K in practical applications.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.