Adaptive control of traffic lights is a key component of any intelligent transportation system. Many real-time traffic light control (TLC) algorithms are based on graded thresholds, because precise information about the traffic congestion in the road network is hard to obtain in practice. For example, using thresholds <formula formulatype="inline" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex Notation="TeX">$L_{1}$</tex></formula> and <formula formulatype="inline" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex Notation="TeX">$L_{2}$</tex> </formula> , we could mark the congestion level on a particular lane as “low,” “medium,” or “high” based on whether the queue length on the lane is below <formula formulatype="inline" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex Notation="TeX">$L_{1}$ </tex></formula> , between <formula formulatype="inline" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex Notation="TeX">$L_{1}$</tex></formula> and <formula formulatype="inline" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex Notation="TeX">$L_{2}$</tex></formula> , or above <formula formulatype="inline" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex Notation="TeX">$L_{2}$</tex> </formula> , respectively. However, the TLC algorithms that were proposed in the literature incorporate fixed values for the thresholds, which, in general, are not optimal for all traffic conditions. In this paper, we present an algorithm based on stochastic optimization to tune the thresholds that are associated with a TLC algorithm for optimal performance. We also propose the following three novel TLC algorithms: 1) a full-state Q-learning algorithm with state aggregation, 2) a Q-learning algorithm with function approximation that involves an enhanced feature selection scheme, and 3) a priority-based TLC scheme. All these algorithms are threshold based. Next, we combine the threshold-tuning algorithm with the three aforementioned algorithms. Such a combination results in several interesting consequences. For example, in the case of Q-learning with full-state representation, our threshold-tuning algorithm suggests an optimal way of clustering states to reduce the cardinality of the state space, and in the case of the Q-learning algorithm with function approximation, our (threshold-tuning) algorithm provides a novel feature adaptation scheme to obtain an “optimal” selection of features. Our tuning algorithm is an incremental-update online scheme with proven convergence to the optimal values of thresholds. Moreover, the additional computational effort that is required because of the integration of the tuning scheme in any of the graded-threshold-based TLC algorithms is minimal. Simulation results show a significant gain in performance when our threshold-tuning algorithm is used in conjunction with various TLC algorithms compared to the original TLC algorithms without tuning and with fixed thresholds.
Read full abstract