AbstractThis paper presents a novel approach to conduct non‐parametric estimations of production technologies that adhere to the basic assumptions of production theory axioms, including free disposability in inputs and outputs and convexity. The methodology is rooted in adapting the highly effective machine learning techniques associated with Random Forest and the use of splines. The new method features a piecewise linear estimator analogous to data envelopment analysis (DEA); however, it distinguishes itself by addressing DEA's overfitting and lack of robustness via randomization of data and input variables in the construction of the models. In this paper, the virtues of employing machine learning techniques for assessing the efficiency of public services, particularly in the realm of educational institutions, are underscored. The new approach has the capability to predict outputs based on inputs, even for units not included in the observed sample. Furthermore, it enables the identification of the most relevant inputs in relation to output production. To demonstrate the advantages of our method, an estimation of the educational production function is conducted for Spanish regions utilizing data sourced from the Program for International Student Assessment.
Read full abstract