Soil heat flux (SHF) is a key component of the surface energy balance and driver of soil physiochemical and biological processes. Accurate estimation of soil heat flux is challenging due to variations in soil composition, overlying vegetation density and phenology, and highly variable environmental forcings. Existing SHF process-based and data-driven estimation methods have focused on midday landscape scale estimates that correspond to satellite acquisitions, despite the high variability that SHF displays at diurnal scales and throughout the growing season. Recently developed data-driven techniques have emphasized pre-determined predictor variables, with model complexity and predictor selection not carefully evaluated, leaving a gap in our understanding of what information is required to accurately predict SHF. Here we developed and evaluated a suite of ensemble machine learning (ML) models to quantify the ability of meteorological and remote sensing data to predict SHF variability at half-hourly temporal resolution throughout a growing season, producing a comprehensive evaluation of the importance of predictor set composition for SHF estimation at high temporal resolution. We compared this suite of machine learning models to six semi-empirical models and found that the machine learning models broadly outperformed the existing models in capturing diurnal variability across the growing season for four agro-ecosystems (soybean, corn, sorghum, and miscanthus). Crop-specific ML models were able to capture over 86% of the variability in SHF using only two predictor variables, pointing to the need for careful evaluation of predictor sets to identify synergistic combinations. ML models developed using pooled data across all crops captured almost 80% of SHF variability using three predictors, demonstrating the power and generalizability of these methods independent of crop type. Shapley additive explanations (SHAP) were used to examine model interpretability, providing insights into the typically opaque ML modelling process and interaction of predictor variables. Models trained with fewer input variables tended to display more linear and interpretable feature attribution, motivating the use of interpretability as an important consideration in parsimonious model selection. These results provide a robust demonstration of the ability of ML to capture the variability in SHF at sub-hourly resolution across growing seasons spanning a wide range of phenological variation for unique agricultural systems. This study provides a comprehensive evaluation of predictor requirements for model performance, guiding future applications that will take advantage of the next generation of satellite-based observing systems, or in-situ proximal observations of vegetation status and meteorological conditions.