Abstract In the steel stockyard of the shipyard, the sorting work to relocate the steel plates already stacked to retrieve the target steel plate on the fabrication schedule is labor-consuming work requiring the operation of overhead cranes. To reduce the sorting work, there is a need for a method of stacking the steel plates in order of fabrication schedules when the steel plates arrive at the shipyard from the steel-making companies. However, the conventional optimization algorithm and heuristics have limitations in determining the optimal stacking location of steel plates because the real-world stacking problems in shipyards have vast solution space in addition to the uncertainty in the arrival order of steel plates. In this study, reinforcement learning is applied to the development of a real-time stacking algorithm for steel plates considering the fabrication schedule. Markov decision process suitable for the stacking problem is defined, and the optimal stacking policy is learned using an asynchronous advantage actor-critic algorithm. The learned policy is tested on several problems by varying the number of steel plates. The test results indicate that the proposed method is effective for minimizing the use of cranes compared with other metaheuristics and heuristics for stacking problems.