Several deep learning methods for spatial data have been developed that report good performance in a big data setting. These methods typically require the choice of an appropriate kernel and some tuning of hyperparameters, which are contributing reasons for poor performance on smaller data sets.In this paper, we propose a mathematical construction of a graph-based neural network for spatial prediction that substantially generalizes the KCN model in [Appleby, Liu and Liu (2020). Kriging convolutional networks. In Proc. AAAI Conf. AI 34, pp. 3187–3194]. In particular, our model, referred to as SPONGE, allows for integrated learning of the convolutional kernel, admits higher order neighborhood structures and can make use of the distance between locations in the neighborhood and between labels of neighboring nodes. All of this yields higher flexibility in capturing spatial correlations.We investigate in simulation studies including small, medium and (reasonably) large data sets in what situations and to what extent SPONGE comes close to or (if the conditions for optimality are violated) even beats universal Kriging, whose predictions incur a high computational cost if n is large. Furthermore we study the improvement for general SPONGE in comparison with the usual KCN.Finally, we compare various graph-based neural network models on larger real world data sets and apply our method to the prediction of soil organic carbon in the southern part of Malawi.