Mineral extraction plays a key role in the global raw materials supply chain, however the exhaustion of shallow deposits and typical scarcity of sampled data during exploration activities creates challenges in mine planning and design, where decision-making is highly sensitive to uncertainty in geology and mineral grade prediction. Geostatistical techniques are commonly used to generate a set of equiprobable simulated numerical models to capture these uncertainties, however incorporating these simulated models within a mine planning and design framework remains a major challenge. The purpose of this paper is to propose a novel approach to decision-making in underground mine design that can use information from an ensemble of numerical realizations of a mineral resource to improve the financial performance of the asset. A deep reinforcement learning (DRL) framework, using the proximal policy optimization (PPO) algorithm, is developed for the design of underground mining production level layouts. A case study is presented using a gold mineral resource characterized by an ensemble of 100 numerical realizations to verify the advantages of the proposed method, considering a baseline consisting of an industry standard automated design method. The DRL approach achieved an 8.3% higher expected profit, a 3.4% more gold mined than the baseline, and has the added functionality of considering uncertainty in mineral grades.