Semiconductor test scheduling problem is a variation of reentrant unrelated parallel machine problems considering multiple resource constraints, intricate {product, tester, kit, enabler assembly} eligibility constraints, sequence-dependant setup times, etc. A multi-step reinforcement learning (RL) algorithm called Sarsa( λ, k) is proposed and applied to deal with the scheduling problem with throughput related objective. Allowing enabler reconfiguration, the production capacity of the test facility is expanded and scheduling optimization is performed at the bottom level. Two forms of Sarsa( λ, k), i.e. forward view Sarsa( λ, k) and backward view Sarsa( λ, k), are constructed and proved equivalent in off-line updating. The upper bound of the error of the action-value function in tabular Sarsa( λ, k) is provided when solving deterministic problems. In order to apply Sarsa( λ, k), the scheduling problem is transformed into an RL problem by representing states, constructing actions, the reward function and the function approximator. Sarsa( λ, k) achieves smaller mean scheduling objective value than the Industrial Method (IM) by 68.59% and 76.89%, respectively for real industrial problems and randomly generated test problems. Computational experiments show that Sarsa( λ, k) outperforms IM and any individual action constructed with the heuristics derived from the existing heuristics or scheduling rules.
Read full abstract