Enhancing Load Stratification in Power Distribution Systems Through Clustering Algorithms: A Practical Study
Accurate load profile identification is crucial for effective and sustainable power system planning. This study proposes a characterization methodology based on clustering techniques applied to consumption data from medium- and low-voltage users, as well as distribution transformers from an electric utility. Three algorithms—K-means, DBSCAN (Density-Based Spatial Clustering of Applications with Noise), and Gaussian Mixture Models (GMM)—were implemented and compared in terms of their ability to form representative strata using variables such as observation count, projected energy, load factor (LF), and characteristic power levels. The methodology includes data cleaning, normalization, dimensionality reduction, and quality metric analysis to ensure cluster consistency. Results were benchmarked against a prior study conducted by Empresa Eléctrica Regional Centro Sur C.A. (EERCS). Among the evaluated algorithms, GMM demonstrated superior performance in modeling irregular consumption patterns and probabilistically assigning observations, resulting in more coherent and representative segmentations. The resulting clusters exhibited an average LF of 58.82%, indicating balanced demand distribution and operational consistency across the groups. Compared to alternative clustering techniques, GMM demonstrated advantages in capturing heterogeneous consumption patterns, adapting to irregular load behaviors, and identifying emerging user segments such as induction-cooking households. These characteristics arise from its probabilistic nature, which provides greater flexibility in cluster formation and robustness in the presence of variability. Therefore, the findings highlight the suitability of GMM for real-world applications where representativeness, efficiency, and cluster stability are essential. The proposed methodology supports improved transformer sizing, more precise technical loss assessments, and better demand forecasting. Periodic application and integration with predictive models and smart grid technologies are recommended to enhance strategic and operational decision-making, ultimately supporting the transition toward smarter and more resilient power distribution systems.
14
- 10.1109/iwobi47054.2019.9114411
- Jul 1, 2019
176
- 10.1109/tsmc.2018.2876202
- Jan 1, 2021
- IEEE Transactions on Systems, Man, and Cybernetics: Systems
235
- 10.1186/s13638-021-01910-w
- Feb 15, 2021
- EURASIP Journal on Wireless Communications and Networking
13
- 10.1109/aeees48850.2020.9121555
- May 1, 2020
8
- 10.1051/e3sconf/20186408004
- Jan 1, 2018
- E3S Web of Conferences
25
- 10.1109/icwr49608.2020.9122313
- Apr 1, 2020
42232
- 10.1111/j.2517-6161.1977.tb01600.x
- Sep 1, 1977
- Journal of the Royal Statistical Society Series B: Statistical Methodology
432
- 10.1016/j.apenergy.2014.12.039
- Jan 6, 2015
- Applied Energy
12
- 10.1109/icassp39728.2021.9414687
- Jun 6, 2021
58
- 10.3390/s20030873
- Feb 6, 2020
- Sensors
- Conference Article
1
- 10.1109/apec43580.2023.10131499
- Mar 19, 2023
This paper proposes an optimal clustering algorithm considering performance deviation of parameters and data preprocessing method for reusing retired batteries. The proposed method regroups batteries by considering the density and performance deviation of the retired battery dataset through a clustering algorithm using density-based spatial clustering of applications with noise (DBSCAN). Additionally, the performance of the algorithm was improved through data preprocessing using a principal component analysis (PCA) that prevents the computational complexity and overfitting of clustering algorithm. The feasibility of the proposed algorithm is verified by comparing with general clustering algorithms such as the k-means clustering and Gaussian mixture model.
- Research Article
1
- 10.1115/1.4067047
- Nov 1, 2024
- ASME Journal of Engineering for Sustainable Buildings and Cities
The electrification of rural communities is crucial from both social and economic perspectives, aligned with Sustainable Development Goal 7: ”Affordable and Clean Energy.” This study presents a comprehensive comparison of clustering techniques, including k-means, Gaussian mixture models (GMM), hierarchical clustering, density-based spatial clustering of applications with noise (DBSCAN), and agglomerative clustering, aimed at enhancing solar irradiance prediction. Leveraging historical climate data from a rural community in the coastal region of Ecuador, each technique is evaluated using error metrics such as mean absolute error (MAE) and coefficient of determination (R2). This assessment identifies the most effective clustering technique in this specific context. In order to delve deeper into these comparisons, simulations are conducted in AMPL to validate and refine the selection of techniques. In this process, it is considered the sizing and design of a microgrid within the Barcelona community, Ecuador, which integrates various energy sources, including solar. Additionally, a penalty system is introduced for unmet energy demands during less critical periods, thereby optimizing efficiency and enhancing energy availability within the community. In conclusion, this article introduces a scalable methodology to analyze algorithms for solar irradiance prediction, emphasizing the significance of comparing clustering techniques as its main contribution. This advancement in prediction accuracy has the potential to enhance the feasibility and efficiency of renewable energy systems for rural communities, thereby fostering sustainable economic growth and bolstering efforts in climate change mitigation and adaptation.
- Research Article
14
- 10.1093/tse/tdz006
- Nov 1, 2019
- Transportation Safety and Environment
The data collected from taxi vehicles using the global positioning system (GPS) traces provides abundant temporal-spatial information, as well as information on the activity of drivers. Using taxi vehicles as mobile sensors in road networks to collect traffic information is an important emerging approach in efforts to relieve congestion. In this paper, we present a hybrid model for estimating driving paths using a density-based spatial clustering of applications with noise (DBSCAN) algorithm and a Gaussian mixture model (GMM). The first step in our approach is to extract the locations from pick-up and drop-off records (PDR) in taxi GPS equipment. Second, the locations are classified into different clusters using DBSCAN. Two parameters (density threshold and radius) are optimized using real trace data recorded from 1100 drivers. A GMM is also utilized to estimate a significant number of locations; the parameters of the GMM are optimized using an expectation-maximum (EM) likelihood algorithm. Finally, applications are used to test the effectiveness of the proposed model. In these applications, locations distributed in two regions (a residential district and a railway station) are clustered and estimated automatically.
- Research Article
- 10.30598/barekengvol19iss3pp2039-2056
- Jul 1, 2025
- BAREKENG: Jurnal Ilmu Matematika dan Terapan
Public welfare refers to a condition in which people experience happiness, comfort, prosperity, and can adequately fulfill their basic needs. Indonesia consists of several provinces, each with varying levels of welfare. One crucial aspect in promoting equitable development is ensuring that all regions in Indonesia achieve similar welfare standards. This study aims to classify Indonesian provinces based on socioeconomic welfare indicators, with the results serving as a basis for policy-making that considers regional potential and challenges. The data used in this study are secondary data obtained from the official website of BPS-Statistics Indonesia on provincial welfare indicators from 2020 to 2023. The research methodology includes data collection, descriptive statistical analysis, determining the optimal number of clusters, and comparing the clustering performance of Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and the Gaussian Mixture Model (GMM) using Silhouette Index, Davies-Bouldin Index, and Calinski-Harabasz Index as evaluation metrics. The DBSCAN-based clustering resulted in two clusters: high-welfare and low-welfare regions. Meanwhile, GMM clustering produced five clusters: moderate, fairly low, low, high, and fairly high welfare regions. Based on cluster validity measures, GMM outperformed DBSCAN, achieving a Silhouette score of 0.28, a Davies-Bouldin Index of 1.12, and a Calinski-Harabasz Index of 10.9.
- Research Article
9
- 10.3390/su142013328
- Oct 17, 2022
- Sustainability
To reduce the operating cost and running time of demand responsive transit between urban and rural areas, a DBSCAN K-means (DK-means) clustering algorithm, which is based on the density-based spatial clustering of applications with noise (DBSCAN) and K-means clustering algorithm, was proposed to cluster pre-processing and station optimization for passenger reservation demand and to design a new variable-route demand responsive transit service system that can promote urban–rural integration. Firstly, after preprocessing the reservation demand through DBSCAN clustering algorithm, K-means clustering algorithm was used to divide fixed sites and alternative sites. Then, a bus scheduling model was established, and a genetic simulated annealing algorithm was proposed to solve the model. Finally, the feasibility of the model was validated in the northern area of Yongcheng City, Henan Province, China. The results show that the optimized bus scheduling reduced the operating cost and running time by 9.5% and 9.0%, respectively, compared with those of the regional flexible bus, and 4.5% and 5.1%, respectively, compared with those of the variable-route demand response transit after K-means clustering for passenger preprocessing.
- Research Article
2
- 10.1093/mnras/stae1448
- Jun 13, 2024
- Monthly Notices of the Royal Astronomical Society
In our previous work, we introduced a method that combines two unsupervised algorithms: Density-based spatial clustering of applications with noise (DBSCAN) and Gaussian mixture model (GMM). We applied this method to 12 open clusters based on Gaia Early Data Release 3 (EDR3) data, demonstrating its effectiveness in identifying reliable cluster members within the tidal radius. However, for studying cluster morphology, we need a method capable of detecting members both inside and outside the tidal radius. By incorporating a supervised algorithm into our approach, we successfully identified members beyond the tidal radius. In our current work, we initially applied DBSCAN and GMM to identify reliable members of cluster stars. Subsequently, we trained the random forest algorithm using DBSCAN- and GMM-selected data. Leveraging the random forest, we can identify cluster members outside the tidal radius and observe cluster morphology across a wide field of view. Our method was then applied to 15 open clusters based on Gaia DR3, which exhibit a wide range of metallicity, distances, members, and ages. Additionally, we calculated the tidal radius for each of the 15 clusters using the King profile and detected stars both inside and outside this radius. Finally, we investigated mass segregation and luminosity distribution within the clusters. Overall, our approach significantly improved the estimation of the tidal radius and detection of mass segregation compared to the previous work. We found that in Collinder 463, low-mass stars do not segregate in comparison to high-mass and intermediate-mass stars. Additionally, we detected a peak of luminosity in the clusters, some of which were located far from the centre, beyond the tidal radius.
- Research Article
23
- 10.3390/analytics2040042
- Oct 12, 2023
- Analytics
Recently, peoples’ awareness of online purchases has significantly risen. This has given rise to online retail platforms and the need for a better understanding of customer purchasing behaviour. Retail companies are pressed with the need to deal with a high volume of customer purchases, which requires sophisticated approaches to perform more accurate and efficient customer segmentation. Customer segmentation is a marketing analytical tool that aids customer-centric service and thus enhances profitability. In this paper, we aim to develop a customer segmentation model to improve decision-making processes in the retail market industry. To achieve this, we employed a UK-based online retail dataset obtained from the UCI machine learning repository. The retail dataset consists of 541,909 customer records and eight features. Our study adopted the RFM (recency, frequency, and monetary) framework to quantify customer values. Thereafter, we compared several state-of-the-art (SOTA) clustering algorithms, namely, K-means clustering, the Gaussian mixture model (GMM), density-based spatial clustering of applications with noise (DBSCAN), agglomerative clustering, and balanced iterative reducing and clustering using hierarchies (BIRCH). The results showed the GMM outperformed other approaches, with a Silhouette Score of 0.80.
- Conference Article
30
- 10.1109/isgt-asia.2017.8378347
- Dec 1, 2017
Nowadays, the problem of electricity theft and tampered smart meter data is causing widespread concern. Customer load profiles collected from smart meters can help detect abnormal electricity users and identify electricity theft. In this paper, a density-based electricity theft detection method is proposed to find out abnormal electricity patterns. Several malicious types are used to test the validation of the proposed method. Comparisons with k-means clustering, Gaussian mixture model (GMM) clustering and density-based spatial clustering of applications with noise (DBSCAN) are also con ducted. Numerical experiments show that the proposed method outperforms other methods in almost all the theft types.
- Research Article
- 10.70465/ber.v2i4.50
- Oct 9, 2025
- International Journal of Bridge Engineering, Management and Research
Bridge deck deterioration poses a significant challenge to transportation infrastructure, resulting in costly maintenance and potential safety hazards. Traditional bridge deck assessments primarily rely on visual inspections, which can be subjective and fail to capture subsurface defects, such as delamination, rebar corrosion, and concrete degradation. To enhance the accuracy of condition assessment, this study explores multi-sensor data fusion and clustering techniques for defect identification using Ground Penetrating Radar (GPR) and Impact Echo (IE). By integrating multiple Non-Destructive Evaluation (NDE) datasets, a clustering-based framework was developed to automatically categorize bridge deck conditions. K-Means, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Gaussian Mixture Models (GMM), and Fuzzy C-Means (FCM) clustering algorithms were evaluated to determine their effectiveness in grouping similar defect patterns. The optimal number of clusters is determined using the Elbow Method, Silhouette Score, and Davies-Bouldin Index. Results indicate that DBSCAN outperforms other clustering techniques in detecting defect hotspots while effectively handling noise and spatial inconsistencies. The clustered defects are mapped spatially to visualize regions of deterioration, enabling bridge engineers to identify high-risk areas and prioritize maintenance efficiently.
- Conference Article
1
- 10.1109/iecon.2016.7793050
- Oct 1, 2016
The use of low-cost, camera sensors for Simultaneous Localization And Mapping and Moving Object Tracking (SLAMMOT) is a developing research area. Image features can be static or dynamic, sparse or dense, and can appear or disappear, making them difficult to track individually over an image sequence. Clustering techniques have been recommended and used to cluster image features to improve tracking results. New and affordable RGB-D cameras, provide both color and depth information. This paper compares five different clustering algorithms to determine which algorithm would be best suited to cluster features from RGB-D image sequences for tracking objects in an indoor dynamic environment. Speeded Up Robust Features (SURF) are used and the performance of k-means, mean shift, a contrario, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and Gaussian Mixture Models (GMM) clustering algorithms are validated in tests with synthetic and RGB-D data. Results indicate that mean shift clustering may be suitable for the SLAMMOT task as it appeared best for overall performance as well as for execution efficiency.
- Conference Article
- 10.56952/arma-2024-0979
- Jun 23, 2024
ABSTRACT: The correlation of rock mechanical properties from one well to another across an area of interest poses a classical and ongoing problem in rock mechanics. This work illustrates identification of the mechanical layers/zones in a geothermal reservoir using unsupervised machine learning (ML) techniques. Mechanical stratigraphy was defined using well logs obtained from three wells located at the Utah FORGE geothermal site: 58-32, 16A(78)-32 and 16B(78)-32. The widely accepted unsupervised ML techniques including K-means clustering, Gaussian mixture models, and DBSCAN (density-based spatial clustering of applications with noise) were utilized to generate the rock classes based on similarities/differences in mechanical attributes. The rock mechanical classifications were performed using a combination of parameters including measured log data (compressional and shear wave interval transit times) and augmented features such as Poisson's ratio, and Young's modulus. The performance of ML clustering models were evaluated using Silhouette index (SI) and Davies-Bouldin index (DBI) criteria. The evaluation measures of predicted classification reflected the effectiveness and applicability of the proposed ML approaches to generate mechanical stratigraphy. Evaluation measures SS and DBI represent the good quality and reliability of proposed classification with higher SI, CHI, and lower DBI scores. The best performance for the proposed clustering model was exhibited by K-means algorithm with SI, DBI and CHI scores of 0.86, 0.4, and 79, respectively. The proposed mechanical units cluster models were observed to be consistent with the lithological stratigraphy of the studied wells. This approach is therefore shown to provide efficient and reliable identification of mechanical stratigraphy for FORGE with the capability for application across a wide range of subsurface reservoirs. 1. INTRODUCTION Rocks are formed in different lithostratigraphic units that have a wide range of mechanical characteristics (Boersma et al., 2020). According to Ferrill et al. (2017) and Smart et al. (2014). The mechanical characteristics are often described in terms of stiffness and strength properties, including elastic parameters, tensile strength, and compressive strength (Roche et al., 2013).
- Research Article
6
- 10.14704/web/v18si02/web18068
- Apr 29, 2021
- Webology
An essential aspect of the transport system is public passenger transport and the Public Transport (PT) movement prediction is significant issues faced in the transport planning area because of its operational importance. In recent years, Intelligent Transportation Systems (ITS) have received a growing amount of interest. There are many advances and innovative applications that have been introduced for a safer, highly efficient, and even congenial environment from PT. A reliable and efficient system of traffic flow prediction is required for accomplishing these applications that build an event with the application of ITS implementations to resolve the potential road situation in advance. However, the PT network efficiency plays the main role for all urban authority areas in which the advancement of both communication and location devices are randomly increasing the data availability generated over the operational platform. In order to recognize trends useful for improving the Schedule Plan, adequate Machine Learning (ML) approaches need to be implemented. Therefore, this paper focused in heterogeneous data that affect the prediction value which is utilized for predicting the demand transport required in the particular route and arrival time of public transport using Density-Based Spatial Clustering of Applications with Noise (DBSCAN) with Seasonal Autoregressive Integrated Moving Average (SARIMA) algorithm to analyze the forecasting of the real-time passenger demand dynamically endorsed the growth of the dynamic bus management and scheduling. Moreover, the accuracy of proposed SARIMA Model is compared with traditional hybrid model such as Gaussian Mixture Model (GMM) with ARIMA model for providing an efficient and robust prediction of PT based on passenger demand.
- Research Article
- 10.52436/1.jutif.2025.6.3.4439
- Jun 10, 2025
- Jurnal Teknik Informatika (Jutif)
Investing in the stock market is challenged by high volatility, which often leads to inaccurate price predictions. Prediction models often struggle to handle the fluctuation phenomenon and produce unstable forecasts. This study aims to predict stock prices in three banks, namely PT Bank Central Asia Tbk (BBCA), PT Bank Rakyat Indonesia (Persero) Tbk (BBRI), and PT Bank Mandiri (Persero) Tbk (BMRI) using Long Short-Term Memory (LSTM) with the integration of Density-Based Spatial Clustering of Applications with Noise (DBSCAN) for anomaly detection. DBSCAN is applied with an epsilon (ε) of 0.5 and a minimum of 5 samples using Euclidean distance. The LSTM model consists of two hidden layers with 50 units, optimized using Adam, and applying the Mean Squared Error (MSE) loss function. The results show that DBSCAN improves prediction accuracy under several conditions. For BBCA stock, the lowest MSE was 0.003 at the 2nd fold with DBSCAN compared to 0.006 without DBSCAN. For BMRI stock achieved an MSE of 0.003 at the 4th fold with DBSCAN, while the 5th fold without DBSCAN obtained 0.000. For BBRI stock showed the best MSE of 0.003 at the 2nd fold with DBSCAN and the 5th fold without DBSCAN. These results show that the integration of DBSCAN can improve prediction especially when extreme price fluctuations occur. This research contributes to the development of stock price prediction methods that can be one of the benchmarks for investors before making decisions so that they do not experience losses.
- Book Chapter
26
- 10.1016/b978-0-12-821929-4.00002-0
- Jan 1, 2021
- Machine Learning Guide for Oil and Gas Using Python
Chapter 4 - Unsupervised machine learning: clustering algorithms
- Research Article
- 10.1080/17499518.2024.2341257
- Apr 13, 2024
- Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards
Landslides exhibiting step-wise deformation characteristics are extensively dispersed throughout the Three Gorges Reservoir (TGR) region of China. Predicting the deformation state of landslides in TGR holds paramount significance in landslide early warning and risk management. Machine learning-based landslide deformation state prediction is a combination of clustering and imbalanced classification. This paper compares the efficacy of three prevalent clustering methods, namely K-means, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and Gaussian Mixture Model (GMM), in the clustering analysis process. Furthermore, the paper evaluates the performance of three widely-used data sampling technologies, namely Synthetic Minority Oversampling Technique (SMOTE), SMOTE-Edited Nearest Neighbors (SMOTE-ENN), and ADAptive SYNthetic Sampling (ADASYN), in the imbalanced classification process. The Baijiabao and Bazimen landslides in the TGR region, which exhibit step-wise deformation characteristics, are used as case studies. Results indicate that DBSCAN and GMM exhibit significant advantages in the clustering process. Meanwhile, the mixture models that integrate oversampling technologies and classification algorithms perform exceptionally well in imbalanced classification. The aforementioned algorithms are recommended for predicting the deformation states of step-wise landslides in the TGR region. The machine learning-based predictive models can serve as potent instruments in facilitating the implementation of early warning systems aimed at mitigating landslide risks.
- New
- Research Article
- 10.3390/en18225872
- Nov 7, 2025
- Energies
- New
- Research Article
- 10.3390/en18215847
- Nov 6, 2025
- Energies
- New
- Research Article
- 10.3390/en18215846
- Nov 6, 2025
- Energies
- New
- Research Article
- 10.3390/en18215849
- Nov 6, 2025
- Energies
- New
- Research Article
- 10.3390/en18215848
- Nov 6, 2025
- Energies
- New
- Research Article
- 10.3390/en18215837
- Nov 5, 2025
- Energies
- New
- Research Article
- 10.3390/en18215843
- Nov 5, 2025
- Energies
- New
- Research Article
- 10.3390/en18215832
- Nov 5, 2025
- Energies
- New
- Research Article
- 10.3390/en18215840
- Nov 5, 2025
- Energies
- New
- Research Article
- 10.3390/en18215844
- Nov 5, 2025
- Energies
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.