Abstract

Eco-labels are a way to benchmark transportation shipments with respect to their environmental impact. In contrast to an eco-labeling of consumer products, emissions in transportation depend on several operational factors like the mode of transportation (e.g., train or truck) or a vehicle’s current and potential future capacity utilization when new orders are added for consolidation. Thus, satisfying eco-labels and doing this cost efficiently is a challenging task when dynamically routing orders in an intermodal network. In this paper, we model the problem as a multiobjective sequential decision process and propose a reinforcement learning method: value function approximation (VFA). VFAs frequently simulate trajectories of the problem and store observed values (violated eco-labels and costs) for states aggregated to a set of features. The observations are used for improved decision making in the next trajectory. For our problem, we face two additional challenges when applying a VFA, the multiple objectives and the “delayed” realization of eco-label satisfaction due to future consolidation. For the first, we propose different feature sets dependent on the objective function’s focus: costs or eco-labels. For the latter, we propose enhancing the suboptimal decision making and observed pessimistic primal values within the VFA trajectories with optimistic dual decision making when all information of a trajectory is known ex post. This enhancement is a general methodological contribution to the literature of approximate dynamic programming and will likely improve learning for other problems as well. We show the advantages of both components in a comprehensive study for intermodal transport via trains and trucks in Europe.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call