Abstract
Non-technical losses of electric energy are mainly caused by electricity theft, causing damage to power utilities, reducing profits, increasing the energy costs to other consumers, and more. The methods of machine learning have been applied to detect electricity consumption anomalies. However, the characteristic of unbalanced classes in this kind of data opens a possibility to explore unbalanced data handling techniques, that are not explored in most of the literature studies. In this paper, the authors conduce a comparative study between several strategies to balance data sets and applied several machine learning techniques in order to select which machine learning + data handling techniques obtain the better results for the simulations related to the electricity theft detection problem. In this paper, the authors utilized the machine learning methods Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Network (ANN), and the strategies for data balancing Cost-Sensitive Learning (Weighting), Random Undersampling (RUS), Random Oversampling (ROS), K-medoids based undersampling (K-medoids), Synthetic Minority Oversampling Technique (SMOTE) and Cluster-based Oversampling (CBOS). The metrics utilized for the comparison were Area Under ROC Curve (AUC) and F1-score, more suitable for this kind of problem. The results show some combinations can reach significantly better values than others, comparing both the balancing techniques for a same machine learning method itself as well as comparing these combinations between themselves.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.