Optimisation of sampling design (methods chosen to select the samples) and sample size (number of samples) remains a key challenge in digital soil mapping, especially in the area of precision farming with the expected economic benefits from the introduction of new technologies. As the existing information is available in the form of relevant environmental covariates, its combination with non-parametric machine learning techniques requires careful planning from the initial field sampling to the final production of digital soil maps. The aim of this study is to compare widely used covariate-wise sampling designs combined with variable sample sizes for supervised prediction of common soil drivers of agricultural productivity (pH, soil organic carbon, soil macronutrients) in a real case study of a field (35 ha) with heterogeneous soil properties. From a total of 200 samples, we evaluated different sample sets where 10, 30 and 60 field samples were selected by conditioned Latin Hypercube Sampling (cLHS) and Feature Space Coverage Sampling (FSCS) to calibrate random forest (RF) models. The evaluation was performed on independently in-situ sampled test points. In addition to these datasets, we also compared the investigated methods with Simple Random Sampling (SRS) in a numerical benchmark experiment with increasing sample size, comparing the global accuracies of the predicted maps on the test points, but using interpolated maps as the artificial true population for each soil characteristic. The results of the study in both the field experiment and the numerical experiment showed slightly better results for the FSCS method, especially when the number of samples was small. At smaller training sample sizes, the risk of insufficiently accurate prediction models was slightly lower for FSCS and the difference decreased as the sample size increased. Nevertheless, sample size proved to be the most important factor in the accuracy of RF models, regardless of the sampling technique. The results suggest that a sample size between 18 and 30 training samples (0.6 to 1 sample ha−1) seems plausible for covariate-wise predictions using RF at field scale in our case study. The relative importance of each auxiliary variable for each RF calibration was also assessed for the field experiment. The results showed that the additional introduction of spatial proxies overshadowed the importance of other covariates, but only significantly improved the model calibration at larger sample sizes. The calibrated models without spatial proxies showed the strongest effect of remotely sensed surface characteristics.