Abstract. Despite showing great success of applications in many commercial fields, machine learning and data science models generally show limited success in many scientific fields, including hydrology (Karpatne et al., 2017). The approach is often criticized for its lack of interpretability and physical consistency. This has led to the emergence of new modelling paradigms, such as theory-guided data science (TGDS) and physics-informed machine learning. The motivation behind such approaches is to improve the physical meaningfulness of machine learning models by blending existing scientific knowledge with learning algorithms. Following the same principles in our prior work (Chadalawada et al., 2020), a new model induction framework was founded on genetic programming (GP), namely the Machine Learning Rainfall–Runoff Model Induction (ML-RR-MI) toolkit. ML-RR-MI is capable of developing fully fledged lumped conceptual rainfall–runoff models for a watershed of interest using the building blocks of two flexible rainfall–runoff modelling frameworks. In this study, we extend ML-RR-MI towards inducing semi-distributed rainfall–runoff models. The meaningfulness and reliability of hydrological inferences gained from lumped models may tend to deteriorate within large catchments where the spatial heterogeneity of forcing variables and watershed properties is significant. This was the motivation behind developing our machine learning approach for distributed rainfall–runoff modelling titled Machine Induction Knowledge Augmented – System Hydrologique Asiatique (MIKA-SHA). MIKA-SHA captures spatial variabilities and automatically induces rainfall–runoff models for the catchment of interest without any explicit user selections. Currently, MIKA-SHA learns models utilizing the model building components of two flexible modelling frameworks. However, the proposed framework can be coupled with any internally coherent collection of building blocks. MIKA-SHA's model induction capabilities have been tested on the Rappahannock River basin near Fredericksburg, Virginia, USA. MIKA-SHA builds and tests many model configurations using the model building components of the two flexible modelling frameworks and quantitatively identifies the optimal model for the watershed of concern. In this study, MIKA-SHA is utilized to identify two optimal models (one from each flexible modelling framework) to capture the runoff dynamics of the Rappahannock River basin. Both optimal models achieve high-efficiency values in hydrograph predictions (both at catchment and subcatchment outlets) and good visual matches with the observed runoff response of the catchment. Furthermore, the resulting model architectures are compatible with previously reported research findings and fieldwork insights of the watershed and are readily interpretable by hydrologists. MIKA-SHA-induced semi-distributed model performances were compared against existing lumped model performances for the same basin. MIKA-SHA-induced optimal models outperform the lumped models used in this study in terms of efficiency values while benefitting hydrologists with more meaningful hydrological inferences about the runoff dynamics of the Rappahannock River basin.