Combining Data Envelopment Analysis and Machine Learning

Nadia M Guerrero,Juan Aparicio,Daniel Valero-Carreras

doi:10.3390/math10060909

Nadia M Guerrero, Juan Aparicio + Show 1 more

Open Access

https://doi.org/10.3390/math10060909

Copy DOI

Journal: Mathematics	Publication Date: Mar 11, 2022
Citations: 8	License type: CC BY 4.0

Affiliation: Miguel Hernandez University

Abstract

Data Envelopment Analysis (DEA) is one of the most used non-parametric techniques for technical efficiency assessment. DEA is exclusively concerned about the minimization of the empirical error, satisfying, at the same time, some shape constraints (convexity and free disposability). Unfortunately, by construction, DEA is a descriptive methodology that is not concerned about preventing overfitting. In this paper, we introduce a new methodology that allows for estimating polyhedral technologies following the Structural Risk Minimization (SRM) principle. This technique is called Data Envelopment Analysis-based Machines (DEAM). Given that the new method controls the generalization error of the model, the corresponding estimate of the technology does not suffer from overfitting. Moreover, the notion of ε-insensitivity is also introduced, generating a new and more robust definition of technical efficiency. Additionally, we show that DEAM can be seen as a machine learning-type extension of DEA, satisfying the same microeconomic postulates except for minimal extrapolation. Finally, the performance of DEAM is evaluated through simulations. We conclude that the frontier estimator derived from DEAM is better than that associated with DEA. The bias and mean squared error obtained for DEAM are smaller in all the scenarios analyzed, regardless of the number of variables and DMUs.

Highlights

These bounds are understood as Probably Approximately Correct (PAC) bounds, which means that the probability of the bound failing is small (Probably) when the bound is achieved through the classifier that has a low error rate (Approximately Correct)
We show the relationship between the Directional Distance Function (DDF) in DEA, model, and the DEAM model (16): The directional distance function (DDF) model always yields a feasible solution of the model associated with Data Envelopment Analysis-based Machines
For the first time, a bound on the generalization error for a piece-wise linear hypothesis has been established in the context of Support Vector Regression (SVR), by considering typical axioms from production theory: convexity and free disposability

Summary

Introduction

One of the most important issues in the field of statistical learning is the reliability of statistical inference methods. (b) We implement the minimization of the balance between the generalization error and the empirical error through a quadratic optimization model that will be called Data Envelopment Analysis-based Machines (DEAM), which has DEA as a particular case. We mention that the expected new insights gained by applying our approach (DEAM) are related to the determination of better estimates of production functions in engineering and microeconomics, in terms of bias and mean squared error. These gains will benefit the technical efficiency measures that can be derived from calculating the distance from a given observation to the production function estimate.

Support Vector Regression (SVR)

Data Envelopment Analysis (DEA)

New PAC Learning with Piece-Wise Linear Hypothesis

II III IV V

Findings

Discussion

Conclusions and Future Work