The deployment of photovoltaic (PV) distributed generation (DG) has been increasing substantially in Brazil. In this context, the present paper exploits a robust dataset provided by the Brazilian Electricity Regulatory Agency (ANEEL) to evaluate the status and trends concerning PV DG. As of Nov/2022, such a dataset consists of more than 1,4 million lines (one for each system) and thirty columns (various information concerning the systems). For an in-depth assessment, three items are addressed: (i) the application of machine learning algorithms to estimate the installed power of individual systems based on other information available in the dataset, (ii) the application of forecasting models to predict the installed power of PV DG over time in Brazil and its regions, and (iii) the application of the data envelopment analysis (DEA) method to rank Brazilian states in terms of efficiency in PV DG deployment. While items (i)-(iii) present distinct specific goals, they use the same dataset and provide essential insights concerning PV DG in Brazil. In item (i), elastic net (EN), decision tree (DT), random forest (RF), extra tree (ET), AdaBoost (AB), and gradient boosting (GB) are applied to select the most accurate algorithm. In item (ii), autoregressive integrated moving average (ARIMA), seasonal autoregressive integrated moving average (SARIMA), Holt-Winters exponential smoothing (HWES), Bass diffusion model (BDM), and multilayer perceptron artificial neural network (MLP-ANN) are applied. In item (iii), output-oriented DEA CCR and BCC are applied. Concerning item (i), results demonstrate that estimating the installed power of individual systems is not very simple. Nonetheless, the machine learning algorithms imply a significant accuracy increase compared to taking the dataset’s average as the estimation, especially the DT and RF algorithms with a coefficient of determination of 29%. Regarding item (ii), results demonstrate that the installed capacity in Brazil and its regions is expected to increase approximately linearly in the horizon of months but with weekly seasonal characteristics. Moreover, the most accurate model depends on the region, i.e., the ANN is superior for the Northeast (MAPE of 3.25%) and South (MAPE of 1.28%) regions, the HWES method is superior for Brazil (MAPE of 0.51%) and the Southeast region (MAPE of 2.26%), and the BDM is superior for the Mid-west (MAPE of 1.95%) and North (MAPE of 3.37%) regions. Concerning item (iii), results demonstrate that DEA CCR is more representative of the Brazilian reality since DEA BCC assigns several states as efficient, even if they showcase a very low installed capacity, particularly small states. DEA CCR indicates that Minas Gerais, Rio Grande do Sul, and Mato Grosso are the most efficient states in PV DG deployment relative to their inputs, closely followed by São Paulo with an efficiency of 95%. Furthermore, states located in the North and Northeast region require more attention from ANEEL since their efficiencies are generally low. A literature review demonstrated that this paper is the first of its kind in exploiting so thoroughly and robustly the dataset provided by ANEEL. Therefore, the paper is expected to assist ANEEL in the task of properly regulating the sector and also assist researchers and PV aggregator companies through in-depth insights. Moreover, the conducted procedures can also be applied to other countries with available datasets, thus ensuring high applicability for the research.
Read full abstract