We consider the hierarchical coordinated control of a multi-procedure conveyor-serviced production station system with flexible stations deployed between adjacent procedures, which includes a dynamic intra-procedure switching control of the flexible stations for the goal of balancing different procedures and a dynamic inter-procedure production coordination of all of the stations within each procedure. It is complicated in terms of modelling and optimisation, and thus, it is difficult to find a solution using numerical methods; as a result, we refer to model-free learning optimisation methods. First, we establish a neuro-dynamic programming algorithm by utilising cerebellar model articulation controllers (CMACs) to approximate state-action values at an upper hierarchy. Second, according to the reaction-diffusion phenomenon, we combine a Wolf-PHC algorithm with a local information-interaction scheme to learn look-ahead control policies at the lower hierarchy. Simulation results show that, compared with traditional Q-learning and the backward Q-learning based Q-learning, our proposed CMAC-based learning optimisation methods have the advantages of yielding a higher processing rate and having a faster optimisation speed with a lower storage requirement.