Abstract A fully connected artificial neural network (ANN) is used to predict the spray characteristics of prefilming airblast atomization. The model is trained from the planar prefilmer experiment from the Ph.D. thesis of Gepperth [Experimentelle Untersuchung des Primärzerfalls an generischen luftgestützten Zerstäubern unter Hochdruckbedingungen, Vol. 75. Logos Verlag Berlin GmbH], in which shadowgraphy images of the liquid breakup at the atomizing edge capture the characteristics of the primary droplets and the ligaments. The quantities extracted from the images are the Sauter Mean Diameter, the mean droplet axial velocity, the mean ligament length, and the mean ligament deformation velocity. These are the prescribed output of the ANN model. In total, the training database contains 322 different operating points at which different prefilmers, liquid types, ambient pressures, film loadings, and gas velocities were investigated. Two types of model input quantities are investigated. First, nine dimensional parameters related to the geometry, the operating conditions, and the properties of the liquid are used as inputs for the model. Second, nine nondimensional groups commonly used for liquid atomization are derived from the first set of inputs. These two types of inputs are compared. The architecture providing the best fitting is determined after testing over 10,000 randomly drawn ANN architectures, with up to 10 layers and up to 128 neurons per layer. The striking results is that for both types of model, the best architectures consist of a shallow net with the hidden layers in the form of a diabolo: three layers with a large number of neurons (≥24) in the first and the last layers, and very few neurons (≈12) in middle layer. This shape recalls the shape of an auto-encoder, where the middle layer would be the feature space of reduced dimensionality. The trend highlighted by our results, to have a limited number of layers, is in contrast with recent observations in deep learning applied to computer vision and speech recognition. It was found that the model with dimensional input quantities always shows a lower test and validation errors than the one with nondimensional input quantities. The best architectures for both types of inputs (dimensional and nondimensional input) were tested versus the experiments. Both provide comparable accuracy, which is better than typical correlations of Sauter mean diameter (SMD) and droplet velocity. As the models take more input parameters into account compared to the correlations, they can predict the experimental data more accurately. Finally, the extrapolation capability of the models was assessed by training them on a confined domain of parameters and testing them outside this domain. It was found that the models can extrapolate at larger gas velocity. With a larger ambient pressure or a lower trailing edge thickness, the accuracy decreases drastically.