Analysis of Hotspot Data for Drought Clustering Using K-Means Algorithm
Drought is a disaster that is often experienced in Indonesia. This disaster occurred because Indonesia's geographical location is on the equator. Drought has had a major impact on the community such as crop failure, forest fires, soil damage, the emergence of disease outbreaks, and the extinction of animals and plants. Based on data from the Ministry of Environment of the Republic of Indonesia, the distribution of Riau's hotspots is quite unique. It is said so, because in this distribution, Riau has increased in every February and March as many as 277 and 248 hotspots in the last two years, namely between 2018 and 2019. To anticipate the drought that occurred in Riau, the clustering of drought-prone areas was conducted based on the analysis of hotspots data. This clustering of vulnerable areas is done by the K-Means algorithm. In determining the number of clusters of vulnerable areas, the elbow method is used as a determinant and produces as many as 4 cluster. The results of these method were analyzed by the silhouette coefficient. The result of analyzed is 0.388632163 and were classified as well-clustered. From these results, Rokan Hilir, Bengkalis, Kota Dumai are the dangerous district with 3106, 2361, and 117 point of dangerous distribution, respectively.
- Research Article
- 10.24167/proxies.v4i1.12433
- Aug 29, 2024
- Proxies : Jurnal Informatika
Rain is one of the hydrological cycles which is a cycle of water rotation from the earth to the atmosphere and back to the earth continuously. High Rainfall may cause some areas that are in lowlands or those with low water infiltration systems will be very susceptible to flooding. For that it is necessary to have a system to classify weather data and rainfall in each city and district so the city that has high rainfall and extreme weather can be given special attention to prevent any natural disaster like flooding. The collected data will be processed with K-Means algorithm to classify the cities or district that have low, medium, high, or very high rainfall data. In the K-Means algorithm the amount of k or cluster usually determined by randomly, on this project will be used a method that is Elbow Method to determine the value of k or cluster and Silhouette Coefficient Method will be used for testing the quality amount of a cluster. The data that will be used is the rainfall data from dataonline.bmkg.go.id at a certain period of time to be classified using the K-Means algorithm. The elbow method and the silhouette method can be used in selecting a good optimal number of clusters, and both methods mostly have the same results in determining the optimal number of clusters, it can be seen that the calculation of accuracy between using the optimal number of clusters is higher rather than not using the amount optimal number of clusters. This can be seen in the results of the clustering in Semarang on February 1 - 28, 2021, when using the amount of K = 4 produce the accuracy result 92.8571429 %, while when using the optimal number of cluster K=3 the accuracy result is higher (97.6190476 %). In the Cilacap city classification on April 1-30 2021, the elbow method and the silhouette coefficient method produce different optimal cluster results, but the accuracy obtained when using the optimal number of clusters from the silhouette coefficient (85.7142857 %) is higher than using the optimal cluster from the elbow method.(74.6031746 %), but when the data is processed with centroid on table 5.10, the elbow method and silhouette coefficient method produce the same amount of optimal number of clusters is 2. This shows that differences in the use of the initial centroid point can affect the results of the elbow method and the silhouette coefficient method
- Research Article
- 10.33330/jurteksi.v11i2.3531
- Mar 23, 2025
- JURTEKSI (Jurnal Teknologi dan Sistem Informasi)
Abstract: Clustering methods such as K-Means and K-Medoids are often used to analyze data, including student data, due to their efficiency. However, this method has weaknesses, such as sensitivity to selecting cluster centers (centroids) and cluster results that depend on medoid data. Clustering, an essential technique in data analysis, aims to reveal the natural structure of the data, even in the absence of labeled information. The study, conducted with complete objectivity, compared the performance of two popular clustering methods, K-Means, and K-Medoids, on student data. Three evaluation metrics, namely the Davies-Bouldin Index (DBI), silhouette score, and elbow method, were used to compare clustering and determine the ideal number of clusters for the two algorithms. The data taken in this study are in the form of names, attendance, assignments, formative, midterm exams, final exams, and quality numbers. Based on the existing optimization results, it can be concluded that the K-Means method excels in grouping Student Data. The best results were obtained from the K-Means Algorithm with the Silhouette Coefficient Method with a value of 0.7509 in cluster 2, and the Elbow Method with a value of 1428076.08 in cluster 2, DBI K-Medoids with a value of 0.7413 in cluster 3. So, the best cluster lies in 3 clusters. Keywords: clustering; davies-bouldin indek; elbow method; k-means; k-medoids; silhouette score; Abstrak : Metode clustering seperti K-Means dan K-Medoids sering digunakan untuk menganalisis data, termasuk data siswa, karena efisiensinya. Namun, metode ini memiliki kelemahan, seperti sensitivitas terhadap pemilihan pusat klaster (centroids) dan hasil klaster yang bergantung pada data medoid. Clustering, sebuah teknik penting dalam analisis data, bertujuan untuk mengungkapkan struktur alami dari data, bahkan tanpa adanya informasi berlabel. Penelitian ini, yang dilakukan dengan objektivitas penuh, membandingkan kinerja dua metode clustering populer, yaitu K-Means dan K-Medoids, pada data mahasiswa. Tiga metrik evaluasi, yaitu Davies-Bouldin Index (D.B.I.), silhouette score, dan metode elbow, digunakan untuk membandingkan clustering dan menentukan jumlah cluster yang ideal untuk kedua algoritma tersebut. data yang diambil dalam penelitian ini berupa nama, kehadiran, tugas, formatif, ujian tengah semester, ujian akhir semester, angka mutu. Berdasarkan hasil optimasi yang ada, dapat disimpulkan bahwasannya metode K-Means unggul dalam pengelompokkan Data Mahasiswa. Sehingga di peroleh hasil terbaik dari Algoritma K-Means dengan Metode Silhouette Coefficient dengan nilai 0,7509 di cluster 2, dan Elbow Method dengan nilai 1428076,08 di cluster 2, DBI K-Medoids dengan nilai 0,7413 di cluster 3. Sehingga cluster terbaik terletak pada 3 cluster. Kata kunci: klasterisasi; davies-bouldin indek; elbow method; k-means; k-medoids; silhouette score;
- Research Article
- 10.23960/jitet.v12i3.4921
- Aug 3, 2024
- Jurnal Informatika dan Teknik Elektro Terapan
Apotek Naza plays an important role in providing medicines to the community. This study utilizes sales data from Apotek Naza for the period of July to December 2023. The K-Means algorithm is used to cluster the medicine data into clusters representing different sales patterns. The Elbow Method is employed to determine the optimal number of clusters (K) based on the Sum of Square Error (SSE). Evaluation is conducted using the Silhouette Coefficient (SC) to measure the quality of the resulting clusters. The analysis results show that the distribution of medicines in each cluster is as follows: 13.7% or 70 items are classified in the high-usage cluster (Cluster 0 - High), 57.5% or 294 items are classified in the medium-usage cluster (Cluster 1 - Medium), and 28.8% or 147 items are classified in the low-usage cluster (Cluster 2 - Low). This indicates a dominance of medium-usage medicines in the Apotek Naza dataset. The obtained Silhouette Score is 0.520, indicating that the clustering is well performed. According to Table 2.1 on the criteria for measuring clustering based on the Silhouette Coefficient (SC), this score indicates that the resulting clusters are fairly compact and well-separated from each other. Keywords: Medicine Inventory, Data Mining, K-Means, KDD, Elbow Method, Silhouette Coefficient
- Research Article
6
- 10.28919/cmbn/7335
- Jan 1, 2022
- Communications in Mathematical Biology and Neuroscience
Tuberculosis (TB) is a health problem that has yet to be resolved in Indonesia. Based on WHO data, in 2021 Indonesia will still be in the third rank of the highest TB cases in the world. This study aims to determine how many groups of TB patients are based on age, gender, HIV status, history of diabetes mellitus, chest X-ray, and the results of the Molecular Rapid Test (TCM). The data used in this study were 985 from 2017 to 2020. The method used in this research is K-Nearest Neighbor (KNN) in carrying out the imputation process, as well as comparing the k-means and Fuzzy C-Means (FCM) methods in classifying TB data. Before doing the grouping, the data cleaning process is carried out by an imputation process which is useful for filling in the missing data in this case, using the KNN method. To produce maximum results of data grouping or clustering, it is necessary to determine the right number of clusters. For this reason, this study tries to compare the elbow, silhouette coefficient, and Davies Bouldin Index (DBI) methods. The application of the KNN method in the data imputation process in this study is to use k=5. The application of the K-Means algorithm is to form groups of TB patients based on six features. Determination of the optimal number of clusters using the K-means and FCM methods shows the optimal number of clusters, namely K = 2 but with different values. The results of the clustering test using the elbow method with the K-means and FCM methods are 93288.49. The DBI value for the K-means and FCM methods is 0.4937. Meanwhile, the clustering trial with the silhouette coefficient on K-means yields a value of 0.6318 which is better than the FCM which produces a value of 0.6321. This shows that the results of clustering k-means with silhouette coefficients produce better cluster quality because they have a lower silhouette coefficient value than FCM.
- Conference Article
3
- 10.1109/isyg.2018.8611879
- Nov 1, 2018
Forest, land, or residential fire is a familiar phenomenon in Indonesia for last decade. The high number of fire incidents in Indonesia requires attention from the government so that any natural disasters such as forest fires can be resolved. These fire incidents can be analyzed since the data has already been obtained and recorded from satellite. Unfortunately, the data is too large to be analyzed as it was. Based on data obtained from the EOSDIS website, recorded as many as 289,256 fire spots occur in the region of Sumatra in the timeframe between 2001 and 2014. It needs an algorithm to segment the data or clusters the data so that large data can be processed into good information for the user. In this study, a comparative study of clustering algorithms between the K-Means and the Isodata was conducted. Both algorithms used in this study were assessed based on the quality of the clusters produced, which is calculated using Silhouette Coefficient (SC). The final result value of Silhouette Coefficient the K-Means method is 0.999997187, and the Isodata method is 0.999957161. so in this case, K-Means algorithm has a higher SC value compared to the Isodata algorithm in clustering the data of fire spots with a small SC value difference.
- Research Article
22
- 10.3390/en14185902
- Sep 17, 2021
- Energies
In this article, a case study is presented on applying cluster analysis techniques to evaluate the level of power quality (PQ) parameters of a virtual power plant. The conducted research concerns the application of the K-means algorithm in comparison with the agglomerative algorithm for PQ data, which have different sizes of features. The object of the study deals with the standardized datasets containing classical PQ parameters from two sub-studies. Moreover, the optimal number of clusters for both algorithms is discussed using the elbow method and a dendrogram. The experimental results show that the dendrogram method requires a long processing time but gives a consistent result of the optimal number of clusters when there are additional parameters. In comparison, the elbow method is easy to compute but gives inconsistent results. According to the Calinski–Harabasz index and silhouette coefficient, the K-means algorithm performs better than the agglomerative algorithm in clustering the data points when there are no additional features of PQ data. Finally, based on the standard EN 50160, the result of the cluster analysis from both algorithms shows that all PQ parameters for each cluster in the two study objects are still below the limit level and work under normal operating conditions.
- Research Article
3
- 10.1016/j.procs.2024.02.156
- Jan 1, 2024
- Procedia Computer Science
Segmenting the Higher Education Market: An Analysis of Admissions Data Using K-Means Clustering
- Research Article
- 10.29303/jppipa.v11i2.10011
- Feb 28, 2025
- Jurnal Penelitian Pendidikan IPA
One of the companies in Semarang engaged in gadget sales services has an Apple Ecosystem information system for selling products from an exclusive brand, Apple. Inside there are sales transactions and also service devices iPad, Macbook Air, Macbook Pro, AirPods, Mac, and Apple Accsessories. This research uses purchase transaction data from Apple Ecosystem customers for the period 2023. The use of RFM (Recency, Frequency, Monetary) analysis helps in determining the attributes used for customer segmentation. To determine the optimal number of clusters from the RFM dataset, the Elbow method is applied. The dataset generated from RFM is grouped using the K-Means algorithm, the quality of the algorithm will be compared in cluster formation using the Silhouette Coefficient method. All procedures will be loaded into the Customer Segmentation App (RFM Clustering) web application. Customer segmentation from RFM datasets that have been clustered produces 3 optimal clusters, namely Cluster 2 is High Spenders with 326 customers, Cluster 0 is VIP Customers, Cluster 1 is Frequent Buyers. Cluster validation of k-means using the silhouette coefficient produces a value of 0.3524.
- Research Article
1
- 10.21107/kursor.v12i01.269
- Jun 30, 2023
- Jurnal Ilmiah Kursor
K-Means Algorithm can be used to group tourists based on reviews on tourist destination objects. This algorithm has a weakness that is sensitive to the determination of the initial centroid. The initial centroid that is determined at random will decreasing the level accuracy, often gets stuck at the local optimum, and gets a random solution. Optimization algorithms such as PSO can overcome this by determining the optimal initial centroid. The optimal number of clusters (K) will be determined using the Elbow method by calculating the SSE value of the resulting cluster. The average Silhouette Coefficient (SC) is used to measure the quality of the clusters produced by the K-Means Algorithm with and without the PSO Algorithm. This study uses secondary data obtained from the UCI Machine Learning Repository with the name Travel Reviews Data Set which consists of 980 records and 10 attributes. The test results show that K=2 is the optimal number of clusters. The K-Means and PSO Algorithm gives an average SC value of 0.300358 which is better than without the PSO Algorithm of 0.300076. The optimal PSO hyperparameter generated is the number of particles=30, \varphi_1=2.2, and {\ \varphi}_2=3 at maximum iteration of 100.
- Research Article
- 10.37600/tekinkom.v7i2.1226
- Dec 31, 2024
- Jurnal Teknik Informasi dan Komputer (Tekinkom)
Advancements in information technology have transformed various aspects of human life, including the business world. Companies are required to use technology and data effectively to enhance their competitive advantage. One increasingly relevant strategy is Customer Relationship Management (CRM), where customer data is the main focus. Consumer data segmentation is an approach used to group customers based on certain characteristics. In this study, the K-Means Clustering algorithm is applied to consumer data segmentation to improve the marketing strategy of a store. The study begins with the collection of customer data from the Dan+Dan Telukjambe 2 store, followed by Exploratory Data Analysis (EDA) to understand the patterns and characteristics of the data. Preprocessing steps are carried out to ensure the data is ready for use, including removing irrelevant columns, handling missing values, and data transformation. Principal Component Analysis (PCA) is used to reduce data dimensions before applying K-Means Clustering. The Elbow Method and Silhouette Score are used to determine the optimal number of clusters. The study results indicate that the optimal number of clusters is six. Evaluation using the Silhouette Coefficient provides an average coefficient value of 0.66, indicating good clustering quality. Further analysis shows different distributions of age, purchasing power, occupation, and marital status in each cluster, providing deep insights into customer segments. The resulting clusters offer valuable information for developing more effective and targeted marketing strategies
- Research Article
6
- 10.1016/j.lana.2021.100102
- Nov 3, 2021
- The Lancet Regional Health - Americas
Fire association with respiratory disease and COVID-19 complications in the State of Pará, Brazil
- Research Article
- 10.3390/su172310440
- Nov 21, 2025
- Sustainability
This study explores the Environmental, Social, and Governance (ESG) disclosure practices of 31 information technology firms listed on Borsa Istanbul (BIST), with a particular emphasis on transparency and accountability. Building on legitimacy, stakeholder, and signalling theories, the study develops a composite ESG disclosure index based on 18 binary indicators covering strategy, environmental performance, social impact, stakeholder engagement, and governance structures. Each indicator is equally weighted and combined into environmental, social, and governance sub-indices, which are then aggregated into a firm-level ESG disclosure score using a single min–max normalisation scheme. K-means clustering, validated through the elbow method and silhouette coefficient, is applied to identify groups of firms with similar ESG disclosure profiles. The empirical results show substantial heterogeneity in disclosure intensity across Turkish IT firms. A small group of companies exhibits proactive and comprehensive ESG communication, whereas many firms disclose only limited and fragmented information. Governance- and reporting-related indicators (C6 and C7) are particularly influential, underscoring the importance of standardised ESG reporting and board-level oversight in strengthening transparency. The study contributes to the emerging ESG disclosure literature by providing a methodologically consistent framework for assessing ESG transparency in the IT sector and offering practical insights for regulators, investors, and corporate decision-makers aiming to improve the reliability of ESG reporting in Turkey.
- Research Article
- 10.9734/jerr/2025/v27i121751
- Dec 17, 2025
- Journal of Engineering Research and Reports
Aims: The Niger Delta experiences high levels of routine gas flaring. This leads to wasted associated gas, economic losses, and contributes to environmental degradation. The fragmented spatial distribution of flare sites further complicates gas capture and infrastructure planning. This study aims to apply a geospatial cluster-based optimization approach to group twenty-four (24) onshore gas-flaring sites in the Niger Delta. The objective is to improve flare-gas recovery potential and guide the design of centralized gas-gathering infrastructure. Study Design: It follows a quantitative geospatial clustering using Python-based K-means analysis, supported by internal cluster-validation metrics. Place and Duration of Study: The study was carried out using flare-volume records and GPS data from 24 onshore flowstations across the Niger Delta, covering approximately 100 days of operational reporting between July to December 2024. Methodology: Daily flare-volume datasets were pre-processed to compute average active-day flare rates for each flowstation. Latitude and longitude coordinates were compiled using Google Earth Pro and field entries. The Elbow Method was used to determine the optimal number of clusters (K) based on inertia values. K-means clustering was then applied to group the flowstations into distinct geospatial clusters. Internal cluster quality was evaluated using the silhouette coefficient. Aggregated flare volumes were computed for each cluster to assess recovery potential. Results: The Elbow Method identified four clusters as the optimal configuration. K-means clustering produced coherent spatial groupings reflecting natural geographic alignments within the region. Cluster 0 recorded the highest aggregated average flare volume (≈ 45.99 mmscf/day), followed by Cluster 3 (≈ 30.22 mmscf/day), Cluster 1 (≈ 24.56 mmscf/day), and Cluster 2 (≈ 22.55 mmscf/day). Silhouette analysis confirmed strong internal cohesion and clear separation between clusters, with no misclassified points. Together, Clusters 0 and 3 accounted for approximately 63% of total aggregated flare volume; this indicates priority zones for possible centralized gas-gathering development. Conclusion: Geospatial clustering provides a robust foundation for designing shared-infrastructure flare-gas recovery systems in the Niger Delta. The four-cluster model presented two priority hubs suitable for centralized infrastructure development. This can reduce total pipeline distance. The cluster model forms a baseline for subsequent techno-economic feasibility studies.
- Conference Article
- 10.54941/ahfe1001207
- Jan 1, 2021
Before the maturation of vehicle’s self-driving, human-vehicle shared control would be a dominant solution in a certain period. Understanding driver’s maneuver behavior is an important prerequisite for providing drivers with different levels of assistance in the collaborative driving system. This research aims to classify the characteristics of drivers’ maneuver modes and establish a general model of driver steering styles.Firstly, an experiment is designed to collect the behavioral data under a certain circumstance of the drivers. As driving simulating has significant advantages over the real vehicle, for instance, the replicability and stability of the testing scene, this experiment is conducted on a driving simulator platform with six degrees of freedom (6-DOF). 38 participants (21 males, 17 females) with different personalities and driving experiences are required to drive through a U-shaped testing scene. Meanwhile, data such as velocities, lateral deviations, and steering wheel torques are recorded at a frequency of 60Hz. Secondly, in the data processing part, the Principal Component Analysis (PCA) is utilized to extract key features from original data, aiming to reduce redundancy between steering characteristics. Two principal components are calculated to represent the original features. And then, determining the clustering number as three by both Elbow Method and Silhouette Coefficient, three types of driving styles are classified by the K-means cluster. Finally, after the explanation based on the corresponding original data of lateral deviations, steering wheel torques, and its change rate, the drivers’ proficiencies and path tracking abilities are compared, and the three styles are defined as moderate, radical, and conservative types. In the result analyzing, the driver’s path tracking ability is reflected by smaller lateral deviation, a middle steering wheel torque represents higher proficiency, and the change rate of the torque can show the extent of radicalness. The results show that the moderate driver type has high proficiency in vehicle control, who has more direction adjustments and strong path tracking accuracy. The radical type drivers also manipulate the steering wheel a lot, but their routes have relatively violent fluctuations. While the conservative drivers operate the steering wheel carefully, which displays their lack of driving adeptness.This study identifies the specific characteristics of drivers’ steering behaviors and obtains the parametric boundary of driving styles. In further work, the results can be used as a design basis for customizing shared steering controllers for different driver types in collaborative driving. After identifying the driving style by measuring certain steering indexes, a personalized co-drive mode can be confirmed, which makes the driver feel “the vehicle drives like him/herself”, then the human-vehicle trust and driving experience can be greatly improved.
- Research Article
1
- 10.21009/jtp.v24i2.28029
- Aug 26, 2022
- JTP - Jurnal Teknologi Pendidikan
Clustering is a technique for grouping homogeneous data so that the points in each cluster are as similar as possible according to convenience measures such as Euclidean-based distance or correlation-based distance. In the industrial era 4.0, learning media, the environment, the way teachers teach will affect student learning styles. From research on learning styles, many researchers agree on the importance of identifying learning styles to accelerate their learning performance. The purpose of this study is to classify student learning styles in the industrial era 4.0 using the Kmeans algorithm and the elbow method. The research method used is a waterfall. The number of research subjects was 108 students. the results of the research on the number of clusters (K), namely 6, obtained cluster 1 as many as 27 students, cluster 2 as many as 24 students, cluster 3 as many as 21 students, cluster 4 as many as 17 students, cluster 5 as many as 11 students and cluster 6 as many as 8 students. The performance of the grouping results based on the silhouette coefficient is 0.302, which means the grouping structure is weak. In cluster 1, the highest number has auditory elements, followed by kinesthetic and visual elements. The development of ICT-based media is one of the factors of student learning styles in the industrial era 4.0
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.