The digitalization of production systems tends to provide a huge amount of data from heterogeneous sources. This is particularly true for the semiconductor industry wherein real time process monitoring is inherently required to achieve a high yield of good parts. An application of data-driven algorithms in production planning to enhance operational excellence for complex semiconductor production systems is currently missing. This paper shows the successful implementation of a reinforcement learning-based adaptive control system for order dispatching in the semiconductor industry. Furthermore, a performance comparison of the learning-based control system with the traditionally used rule-based system shows remarkable results. Since a strict rulebook does not bind the learning-based control system, a flexible adaption to changes in the environment can be achieved through a combination of online and offline learning.