The stochastic inventory relocation problem in a one-way electric car-sharing system with uncertain demands

  • Abstract
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

This study considers a stochastic inventory relocation problem for a one-way, station-based car-sharing system that utilises electric vehicles (EV), where customers' rental demands and rented vehicles' travel distances are uncertain and temporal-spatial imbalance. Workers are hired to relocate vehicles between stations. To maximise the total expected profits that can be collected by the system, a worker must determine whether to relocate an EV, which vehicle to choose, and which station to move the EV to upon their arrival at a rental station. The problem is formulated as a Markov decision process (MDP). A reinforcement learning algorithm is proposed to develop dynamic policies for the problem. The reinforcement learning algorithm uses an approximate value iteration (AVI) algorithm to overcome the computational challenges arising from the extensive state and action space. Action-space restriction and state-space aggregation schemes are developed to enhance the performance of the AVI algorithm. The effectiveness of the proposed modelling and solution methodologies is demonstrated through a comparison of the dynamic policies against benchmark solutions. Additionally, sensitivity analyses are conducted to investigate whether parameter configurations will impact the performance of the dynamic policies.

Similar Papers
  • Research Article
  • Cite Count Icon 1
  • 10.3934/jdg.2016014
A perturbation approach to a class of discounted approximate value iteration algorithms with borel spaces
  • Aug 1, 2016
  • Journal of Dynamics and Games
  • Joaquín López-Borbón + 1 more

The present paper gives computable performance bounds for the approximate value iteration (AVI) algorithm when are used approximation operators satisfying the following properties: (i) they are positive linear operators; (ii) constant functions are fixed points of such operators; (iii) they have certain continuity property. Such operators define transition probabilities on the state space of the controlled systems. This has two important consequences: (a) one can see the approximating function as the average value of the target function with respect to the induced transition probability; (b) the approximation step in the AVI algorithm can be thought of as a perturbation of the original Markov model. These two facts enable us to give finite-time bounds for the AVI algorithm performance depending on the operators accuracy to approximate the cost function and the transition law of the system. The results are illustrated with numerical approximations for a class of inventory systems.

  • Research Article
  • Cite Count Icon 13
  • 10.1016/j.ejor.2010.11.019
Approximate dynamic programming via direct search in the space of value function approximations
  • Jan 13, 2011
  • European Journal of Operational Research
  • E.F Arruda + 2 more

Approximate dynamic programming via direct search in the space of value function approximations

  • Conference Article
  • Cite Count Icon 3
  • 10.23919/chicc.2019.8865185
One-way car-sharing system based on recharging strategy
  • Jul 1, 2019
  • Jian Ma + 2 more

The recent advent of internet has promoted the development of sharing economy in many fields, ranging from trip, accommodation. In the field of car-sharing, many studies have been dedicated to the scheduling problem of car-sharing, especially for one-way station-based car-sharing systems in which users are able to return the rented car to any available stations. However, most of current studies have focused exclusively on fuel car-sharing instead of electric car-sharing. For the sake of environmental benefits, most car-sharing companies seek to use electric vehicles instead of non-electric vehicles. Compared with non-electric vehicles, electric vehicles need consider electricity and recharging time, which complicates the operating system. To simplify the problem setting of the electric car-sharing system, most of previous studies have ignored the recharging problem. In this paper, we propose a one-way station-based electric car-sharing system with recharging using Mixed Integer Linear Programming. At last, we do a case study of Jiading District, Shanghai, China to demonstrate the feasibility of the proposed model.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 6
  • 10.3390/app122111149
Robotic Peg-in-Hole Assembly Strategy Research Based on Reinforcement Learning Algorithm
  • Nov 3, 2022
  • Applied Sciences
  • Shaodong Li + 2 more

To improve the robotic assembly effects in unstructured environments, a reinforcement learning (RL) algorithm is introduced to realize a variable admittance control. In this article, the mechanisms of a peg-in-hole assembly task and admittance model are first analyzed to guide the control strategy and experimental parameters design. Then, the admittance parameter identification process is defined as the Markov decision process (MDP) problem and solved with the RL algorithm. Furthermore, a fuzzy reward system is established to evaluate the action–state value to solve the complex reward establishment problem, where the fuzzy reward includes a process reward and a failure punishment. Finally, four sets of experiments are carried out, including assembly experiments based on the position control, fuzzy control, and RL algorithm. The necessity of compliance control is demonstrated in the first experiment. The advantages of the proposed algorithms are validated by comparing them with different experimental results. Moreover, the generalization ability of the RL algorithm is tested in the last two experiments. The results indicate that the proposed RL algorithm effectively improves the robotic compliance assembly ability.

  • Conference Article
  • Cite Count Icon 133
  • 10.5555/1838206.1838208
Combining manual feedback with subsequent MDP reward signals for reinforcement learning
  • May 10, 2010
  • W Bradley Knox + 1 more

As learning agents move from research labs to the real world, it is increasingly important that human users, including those without programming skills, be able to teach agents desired behaviors. Recently, the tamer framework was introduced for designing agents that can be interactively shaped by human trainers who give only positive and negative feedback signals. Past work on tamer showed that shaping can greatly reduce the sample complexity required to learn a good policy, can enable lay users to teach agents the behaviors they desire, and can allow agents to learn within a Markov Decision Process (MDP) in the absence of a coded reward function. However, tamer does not allow this human training to be combined with autonomous learning based on such a coded reward function. This paper leverages the fast learning exhibited within the tamer framework to hasten a reinforcement learning (RL) algorithm's climb up the learning curve, effectively demonstrating that human reinforcement and MDP reward can be used in conjunction with one another by an autonomous agent. We tested eight plausible tamer+rl methods for combining a previously learned human reinforcement function, H, with MDP reward in a reinforcement learning algorithm. This paper identifies which of these methods are most effective and analyzes their strengths and weaknesses. Results from these tamer+rl algorithms indicate better final performance and better cumulative performance than either a tamer agent or an RL agent alone.

  • Book Chapter
  • Cite Count Icon 10
  • 10.1007/978-3-031-22337-2_29
A Framework for Transforming Specifications in Reinforcement Learning
  • Jan 1, 2022
  • Rajeev Alur + 3 more

Reactive synthesis algorithms allow automatic construction of policies to control an environment modeled as a Markov Decision Process (MDP) that are optimal with respect to high-level temporal logic specifications. However, they assume that the MDP model is known a priori. Reinforcement Learning (RL) algorithms, in contrast, are designed to learn an optimal policy when the transition probabilities of the MDP are unknown, but require the user to associate local rewards with transitions. The appeal of high-level temporal logic specifications has motivated research to develop RL algorithms for synthesis of policies from specifications. To understand the techniques, and nuanced variations in their theoretical guarantees, in the growing body of resulting literature, we develop a formal framework for defining transformations among RL tasks with different forms of objectives. We define the notion of a sampling-based reduction to transform a given MDP into another one which can be simulated even when the transition probabilities of the original MDP are unknown. We formalize the notions of preservation of optimal policies, convergence, and robustness of such reductions. We then use our framework to restate known results, establish new results to fill in some gaps, and identify open problems. In particular, we show that certain kinds of reductions from LTL specifications to reward-based ones do not exist, and prove the non-existence of RL algorithms with PAC-MDP guarantees for safety specifications.KeywordsReinforcement learningReactive synthesisTemporal logic

  • Book Chapter
  • Cite Count Icon 2
  • 10.5772/13295
Complex-Valued Reinforcement Learning: a Context-Based Approach for POMDPs
  • Jan 14, 2011
  • Takeshi Shibuya + 1 more

Reinforcement learning (RL) algorithms are representative active learning algorithms that can be used to decide suitable actions on the basis of experience, simulations, and searches (Sutton & Barto, 1998; Kaelbling et al., 1998). The use of RL algorithms for the development of practical intelligent controllers for autonomous robots and multiagent systems has been investigated; such controllers help in realizing autonomous adaptability on the basis of the information obtained through experience. For example, in our previous studies on autonomous robot systems such as an intelligent wheelchair, we used RL algorithms for an agent in order to learn how to avoid obstacles and evolve cooperative behavior with other robots (Hamagami & Hirata, 2004; 2005). Furthermore, RL has been widely used to solve the elevator dispatching problem (Crites & Barto, 1996), air-conditioning management problem (Dalamagkidisa et al., 2007), process control problem (S.Syafiie et al., 2008), etc. However, in most cases, RL algorithms have been successfully used only in ideal situations that are based on Markov decision processes (MDPs). MDP environments are controllable dynamic systemswhose state transitions depend on the previous state and the action selected. On the other hand, because of the limited number of dimensions and/or low accuracy of the sensors used, real-world environments are considered to be partially observable MDPs (POMDPs). In a POMDP environment, the agent faces a serious problem called perceptual aliasing, i.e., the agent cannot distinguish multiple states from one another on the basis of perceptual inputs. Some representative approaches have been adopted to solve this problem (Mccallum, 1995; Wiering & Schmidhuber, 1996; Singh et al., 2003; Hamagami et al., 2002). The most direct representative approach involves the use of the memory of contexts called episodes to disambiguate the current state and to keep track of information about the previous state (Mccallum, 1995). The use of this memory-based approach can ensure high learning performance if the environment is stable and the agent has sufficient memory. However, since most real-world environments belong to the dynamic class, the memory of experience has to be revised frequently. Therefore, the revised algorithm often becomes complex and task-dependent. Another approach for addressing perceptual aliasing involves treatment of the environment as a hierarchical structure (Wiering & Schmidhuber, 1996). In this case, the environment is divided into small sets without perceptual aliasing, so that the agent can individually learn each small set. This approach is effective when the agent knows how to divide the environment into sets with non-aliasing states. However, the agent must learn to divide the 14

  • Research Article
  • Cite Count Icon 50
  • 10.1016/j.jclepro.2018.09.124
Carsharing demand estimation and fleet simulation with EV adoption
  • Sep 28, 2018
  • Journal of Cleaner Production
  • Taekwan Yoon + 3 more

Carsharing demand estimation and fleet simulation with EV adoption

  • Research Article
  • Cite Count Icon 1
  • 10.1177/03611981231192101
Reinforcement Learning for Dynamic Pricing of Shared-Use Autonomous Mobility Systems Considering Heterogeneous Users: Model Development and Scenario Testing
  • Aug 29, 2023
  • Transportation Research Record: Journal of the Transportation Research Board
  • Hoseb Abkarian + 1 more

A key aspect of the success of shared-use autonomous mobility systems will be the ability to price rides in real time. As these services become more prevalent, it becomes of high importance to detect shifts in behavior to quickly optimize the system and ensure system efficiency and economic viability. Therefore, (1) pricing algorithms should be able to price rides according to complex underlying demand functions with heterogeneous customers and (2) the algorithm should be able detect nonstationary behavior (e.g., changing customers’ willingness to pay) from its previously learnt decisions and alter its pricing mechanism accordingly. We formulate a dynamic pricing and learning problem as a Markov decision process and subsequently solve it through a reinforcement learning (RL) algorithm, with heterogeneous customers accepting the trip characteristics (price, expected wait time) probabilistically. Insights from a fixed fleet operation of an autonomous private ridesourcing system in Chicago are presented. Given our formulation of the demand model, the algorithm learns in 25 days, increasing revenue by 90% and decreasing customer wait times by 90% compared to day 5. After gathering insights from the RL algorithm and applying optimal static pricing (i.e., a constant specific surge multiplier), we find that RL can achieve near 90% optimality in revenue. The RL algorithm, nevertheless, proves to be robust. Two scenarios are tested where a sudden shock occurs or customers slowly change their willingness to pay, illustrating that RL can quickly adapt its parameters to the situation.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 9
  • 10.3390/electronics11193161
Secure State Estimation of Cyber-Physical System under Cyber Attacks: Q-Learning vs. SARSA
  • Oct 1, 2022
  • Electronics
  • Zengwang Jin + 5 more

This paper proposes a reinforcement learning (RL) algorithm for the security problem of state estimation of cyber-physical system (CPS) under denial-of-service (DoS) attacks. The security of CPS will inevitably decline when faced with malicious cyber attacks. In order to analyze the impact of cyber attacks on CPS performance, a Kalman filter, as an adaptive state estimation technology, is combined with an RL method to evaluate the issue of system security, where estimation performance is adopted as an evaluation criterion. Then, the transition of estimation error covariance under a DoS attack is described as a Markov decision process, and the RL algorithm could be applied to resolve the optimal countermeasures. Meanwhile, the interactive combat between defender and attacker could be regarded as a two-player zero-sum game, where the Nash equilibrium policy exists but needs to be solved. Considering the energy constraints, the action selection of both sides will be restricted by setting certain cost functions. The proposed RL approach is designed from three different perspectives, including the defender, the attacker and the interactive game of two opposite sides. In addition, the framework of Q-learning and state–action–reward–state–action (SARSA) methods are investigated separately in this paper to analyze the influence of different RL algorithms. The results show that both algorithms obtain the corresponding optimal policy and the Nash equilibrium policy of the zero-sum interactive game. Through comparative analysis of two algorithms, it is verified that the differences between Q-Learning and SARSA could be applied effectively into the secure state estimation in CPS.

  • Research Article
  • Cite Count Icon 8
  • 10.1016/j.tre.2024.103698
Reinforcement learning for electric vehicle charging scheduling: A systematic review
  • Aug 13, 2024
  • Transportation Research Part E
  • Zhonghao Zhao + 3 more

Reinforcement learning for electric vehicle charging scheduling: A systematic review

  • Research Article
  • Cite Count Icon 10
  • 10.1016/j.omega.2018.10.015
An inventory control policy for liquefied natural gas as a transportation fuel
  • Oct 26, 2018
  • Omega
  • Jose A Lopez Alvarez + 3 more

An inventory control policy for liquefied natural gas as a transportation fuel

  • Conference Article
  • Cite Count Icon 5
  • 10.1109/ccta.2018.8511320
A Q-Learning Method for Scheduling Shared EVs Under Uncertain User Demand and Wind Power Supply
  • Aug 1, 2018
  • Junjie Wu + 1 more

The last few years have witnessed the fast rise of sharing economy around the world. Thanks to the rapid development of electric vehicle industry and its higher market share, the business of shared electric vehicles (EVs) gains the opportunity to expand. With the improvements in charging facilities, wind power generation of high-rise buildings is expected to be a major technology to utilize renewable energy in cities. While the intermittence of wind power makes it hard to be used. Shared EVs are the perfect users of wind power for their flexibilities in using and charging. However, the scheduling of shared EVs is highly challenging because of the randomness both in wind power supply and the user demand. We address this important problem in this paper. We formulate the scheduling of shared EVs in the framework of Markov decision process. An agent-based state is defined, based on which a distributed optimization algorithm can be applied. We propose a Q-learning algorithm to solve the problem of scheduling shared EVs to maximize the global daily income. Both the users' uncertain demand and stochastic wind power supply are considered. The performance of the proposed algorithm is illustrated by numerical experiments.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/radar53847.2021.10027914
A Probabilistic Jamming Strategy Model for Frequency Agility Radar Anti-Jamming Problem
  • Dec 15, 2021
  • Youlin Fan + 5 more

With the development of electronic warfare, the jammer is much smarter than before and can adopt different strategies to jam the radar. To design efficient anti-jamming strategies, the radar anti-jamming problem is recently modeled as a Markov decision process (MDP) problem and reinforcement learning (RL) algorithms are used to solve it. However, common RL algorithms needs a large number of interaction samples, which is not realistic in practice. To address this problem, we propose a probabilistic jamming strategy model for the frequency agility radar (FA) that can learn the unknown jamming strategy through fewer interaction samples. Based on proposed model, a jamming strategy is first expressed by a series of unknown probability matrixes and weights. Then, the FA radar is used to induce jammer to emit jamming signals and collects samples of radar-jammer interaction in MDP. The model parameters are estimated by means of maximum likelihood estimation (MLE). Finally, RL algorithm can be used to train the anti-jamming strategies through offline interaction. Simulation results illustrate the effectiveness of the proposed model.

  • Research Article
  • Cite Count Icon 17
  • 10.1016/j.automatica.2016.12.019
Data-driven approximate value iteration with optimality error bound analysis
  • Jan 24, 2017
  • Automatica
  • Yongqiang Li + 3 more

Data-driven approximate value iteration with optimality error bound analysis

More from: International Journal of Systems Science: Operations & Logistics
  • New
  • Research Article
  • 10.1080/23302674.2025.2573992
Flexible job shop scheduling using Jaya-Tabu search algorithm
  • Nov 5, 2025
  • International Journal of Systems Science: Operations & Logistics
  • Selva Kumar Chandrasekar + 2 more

  • Research Article
  • 10.1080/23302674.2025.2566724
The impact of advanced technologies on healthcare supply chain performance in Sierra Leone: a structural equation modelling approach
  • Oct 18, 2025
  • International Journal of Systems Science: Operations & Logistics
  • Sallieu Kargbo + 2 more

  • Research Article
  • 10.1080/23302674.2025.2563361
The basic EPQ models with non-instantaneous deteriorating items: modelling and optimal policy
  • Oct 9, 2025
  • International Journal of Systems Science: Operations & Logistics
  • Bashair Ahmad + 1 more

  • Research Article
  • 10.1080/23302674.2025.2547756
A multi-objective mathematical model to solve a closed-loop pistachios supply chain network problem considering purchasing substitution and discount levels: multi-objective algorithms
  • Oct 6, 2025
  • International Journal of Systems Science: Operations & Logistics
  • Mehran Gharye Mirzaei + 3 more

  • Research Article
  • 10.1080/23302674.2025.2549438
A novel storage location allocation strategy for intelligent E-commerce warehouse with new products
  • Sep 30, 2025
  • International Journal of Systems Science: Operations & Logistics
  • Fuqiang Lu + 2 more

  • Research Article
  • 10.1080/23302674.2025.2559257
When warehouses dream: a systems-informed exploration of Industry 5.0 principles in warehousing performance management
  • Sep 29, 2025
  • International Journal of Systems Science: Operations & Logistics
  • Ahmed Mohammed + 4 more

  • Research Article
  • 10.1080/23302674.2025.2560555
Does economic policy uncertainty impact inventories and firm value? Evidence from the US economy
  • Sep 29, 2025
  • International Journal of Systems Science: Operations & Logistics
  • Wassim Dbouk + 2 more

  • Research Article
  • 10.1080/23302674.2025.2544714
Unmanned aerial vehicle and autonomous delivery robot station for last-mile delivery services
  • Aug 13, 2025
  • International Journal of Systems Science: Operations & Logistics
  • Byoungil Choi + 3 more

  • Research Article
  • 10.1080/23302674.2025.2545517
Comparison stochastic optimisation approaches for the multi-mode resource-constrained multi-project scheduling problem
  • Aug 12, 2025
  • International Journal of Systems Science: Operations & Logistics
  • Pham Vu Hong Son + 2 more

  • Research Article
  • 10.1080/23302674.2025.2541923
A reliability model for electric vehicle routing problem under charging failure risk
  • Aug 6, 2025
  • International Journal of Systems Science: Operations & Logistics
  • Xun Weng + 4 more

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon