AbstractDeep learning models, such as convolutional neural networks, utilize multiple specialized layers to encode spatial patterns at different scales. In this study, deep learning models are compared with standard machine learning approaches on the task of predicting the probability of severe hail based on upper-air dynamic and thermodynamic fields from a convection-allowing numerical weather prediction model. The data for this study come from patches surrounding storms identified in NCAR convection-allowing ensemble runs from 3 May to 3 June 2016. The machine learning models are trained to predict whether the simulated surface hail size from the Thompson hail size diagnostic exceeds 25 mm over the hour following storm detection. A convolutional neural network is compared with logistic regressions using input variables derived from either the spatial means of each field or principal component analysis. The convolutional neural network statistically significantly outperforms all other methods in terms of Brier skill score and area under the receiver operator characteristic curve. Interpretation of the convolutional neural network through feature importance and feature optimization reveals that the network synthesized information about the environment and storm morphology that is consistent with our understanding of hail growth, including large lapse rates and a wind shear profile that favors wide updrafts. Different neurons in the network also record different storm modes, and the magnitude of the output of those neurons is used to analyze the spatiotemporal distributions of different storm modes in the NCAR ensemble.