Integrated spatial analysis of drought risk factors using agglomerative hierarchical clustering and correlation
Integrated spatial analysis of drought risk factors using agglomerative hierarchical clustering and correlation
18
- 10.3390/rs14143307
- Jul 8, 2022
- Remote Sensing
72
- 10.1186/s13717-021-00339-9
- Nov 1, 2021
- Ecological Processes
18
- 10.1007/s11442-023-2091-0
- Mar 1, 2023
- Journal of Geographical Sciences
8
- 10.3390/atmos13091531
- Sep 19, 2022
- Atmosphere
10
- 10.1038/s41598-024-53066-4
- Feb 19, 2024
- Scientific Reports
44
- 10.1016/j.iswcr.2020.10.005
- Oct 27, 2020
- International Soil and Water Conservation Research
2
- 10.1016/j.heliyon.2024.e32347
- Jun 1, 2024
- Heliyon
3
- 10.3390/w16182700
- Sep 23, 2024
- Water
30
- 10.1016/j.scitotenv.2023.165591
- Jul 19, 2023
- Science of The Total Environment
46
- 10.1007/s00357-020-09377-y
- Sep 30, 2020
- Journal of Classification
- Research Article
1
- 10.52783/cana.v31.1242
- Aug 15, 2024
- Communications on Applied Nonlinear Analysis
Introduction: Hierarchical clustering is an unsupervised powerful method for empirical knowledge interpretation from data. It has a fundamental role in understanding the complex pattern in huge datasets. It creates a hierarchical representation of data by forming clusters in two ways namely Agglomerative (Bottom-up) and Divisive (Top-Down). The main advantage is that it does not need to fix number of clusters. Objectives: To handle the issues such as, the pertinence for enormous data is minimal as the computational complexity is high in using Hierarchical clustering, complication of fixing Threshold value in Dendrogram height while combining Flat clustering, and non existence of mathematical objective function to assess the Hierarchical clustering. Methods: On focusing on these challenges, this work proposes (a) a liner split of data in order to reduce the computational complexity in Hierarchical Agglomerative clustering. (b) Fuzzy Partition matrix is created to enhance the cluster generation in Hierarchical clustering. (c) this work applies an objective function in Flat Clustering to ease the process of fixing threshold.Implementation, Results:This work is implemented in Rapidminer tool. The Sum of Squares, Cluster Density and Processing Time is minimized in th e proposed work.Conclusions: The proposed method handles enormous data effectively using linear split with sequential execution, the proposed usage of a sum of squares fixes a optimum threshold value in dendrogram height while transforming the dendrogram to flat clusters, the proposed method improves the existing Hierarchical clustering effectively.
- Research Article
- 10.3389/fped.2023.1171920
- Sep 18, 2023
- Frontiers in pediatrics
Individuals with neurodevelopmental disorders such as global developmental delay (GDD) present both genotypic and phenotypic heterogeneity. This diversity has hampered developing of targeted interventions given the relative rarity of each individual genetic etiology. Novel approaches to clinical trials where distinct, but related diseases can be treated by a common drug, known as basket trials, which have shown benefits in oncology but have yet to be used in GDD. Nonetheless, it remains unclear how individuals with GDD could be clustered. Here, we assess two different approaches: agglomerative and divisive clustering. Using the largest cohort of individuals with GDD, which is the Deciphering Developmental Disorders (DDD), characterized using a systematic approach, we extracted genotypic and phenotypic information from 6,588 individuals with GDD. We then used a k-means clustering (divisive) and hierarchical agglomerative clustering (HAC) to identify subgroups of individuals. Next, we extracted gene network and molecular function information with regard to the clusters identified by each approach. HAC based on phenotypes identified in individuals with GDD revealed 16 clusters, each presenting with one dominant phenotype displayed by most individuals in the cluster, along with other minor phenotypes. Among the most common phenotypes reported were delayed speech, absent speech, and seizure. Interestingly, each phenotypic cluster molecularly included several (3-12) gene sub-networks of more closely related genes with diverse molecular function. k-means clustering also segregated individuals harboring those phenotypes, but the genetic pathways identified were different from the ones identified from HAC. Our study illustrates how divisive (k-means) and agglomerative clustering can be used in order to group individuals with GDD for future basket trials. Moreover, the result of our analysis suggests that phenotypic clusters should be subdivided into molecular sub-networks for an increased likelihood of successful treatment. Finally, a combination of both agglomerative and divisive clustering may be required for developing of a comprehensive treatment.
- Conference Article
4
- 10.1063/1.5064218
- Jan 1, 2018
Application chi-sim co-similarity and agglomerative hierarchical clustering in this study are used to clustering gene expression data of Lymphoma by gene and condition. The process begins by taking the gene expression data of Lymphoma, after that microarray gene expression data of Lymphoma will be standardized by using standardized rows and columns. Then, the concept of chi-sim co-similarity applied to create the matrix similarity row (SR) and similarity column (SC). The matrix elements of SR and SC are normalized by using a pseudo normalization. Finally, we use three approaches in agglomerative hierarchical clustering to cluster the data by gene and condition. Three approaches in agglomerative hierarchical clustering are single linkage, average linkage, and complete linkage. The result of clustering by column and gene in this study, give us the best outcome when complete linkage in agglomerative hierarchical clustering is combined with chi–sim co-similarity compared with single linkage and average linkage in agglomerative hierarchical clustering combined with chi-sim co-similarity.
- Research Article
3
- 10.1016/j.soncn.2020.151112
- Jan 7, 2021
- Seminars in oncology nursing
Identifying Distinct High Unmet-Need Phenotypes and Their Associated Bladder Cancer Patient Demographic, Clinical, Psychosocial, and Functional Characteristics: Results of Two Clustering Methods.
- Book Chapter
2
- 10.1007/978-3-319-42972-4_9
- Jul 30, 2016
The term fuzzy clustering usually refers to prototype-based methods that optimize an objective function in order to find a (fuzzy) partition of a given data set and are inspired by the classical c-means clustering algorithm. Possible transfers of other classical approaches, particularly hierarchical agglomerative clustering, received much less attention as starting points for developing fuzzy clustering methods. In this chapter we strive to improve this situation by presenting a (hierarchical) agglomerative fuzzy clustering algorithm. We report experimental results on two well-known data sets on which we compare our method to classical hierarchical agglomerative clustering.
- Conference Article
1
- 10.1109/icbda47563.2019.8987044
- Nov 1, 2019
This article presents the experimental work of comparing the performances of two machine learning approaches, namely Hierarchical Agglomerative Clustering and K-means Clustering on Mobile Augmented Reality Usability datasets. The datasets comprises of 2 separate categories of data, namely performance and self-reported, which are completely different in nature, techniques and affiliated biases. This research will first present the background and related literature before presenting initial findings of identified problems and objectives. This paper will the present in detail the proposed methodology before presenting the evidences and discussion of comparing this two widely used machine learning approach on usability data.
- Research Article
3
- 10.29207/resti.v8i2.5663
- Apr 21, 2024
- Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
Sumatra is one of the biggest and the second most crowded islands in Indonesia. Sumatra is also a place of abundance of tropical flora and fauna. This paper aims to cluster the cities in Sumatra and nearby based on the meteorology data. It implements Agglomerative hierarchical clustering and uses a daily time series dataset from 17 cities from 1 January 2010 to 31 December 2023. The dataset contains variables minimum temperature, maximum temperature, average temperature, humidity, sunshine duration, and average wind speed. The preprocessing data was dedicated to managing the missing values and data aggregation to create single-form data. The single-form data contains cities and meteorological variables used as an input for the clustering algorithm, i.e. K-Means, Fuzzy C-Means, K-Medoid, intelligent K-KMeans, and Agglomerative clustering. The Agglomerative clustering outperforms other methods (i.e. K-Means, Fuzzy C-Means, K-Medoid, and intelligent K-KMeans) and produces Silhouette scores of 0.11. The clusters are then analyzed to find their unique pattern. The cut-off when the number cluster is two, Agglomerative hierarchical clustering gathers Aceh, Sabang, Pekanbaru, Padang, and Padang Lawas in Cluster 1. Other cities, i.e., Nagan Raya, Batam, Jambi, Bandar Lampung, Medan, Pangkalpinang, Palembang, Bengkulu, Belitung, Tapanuli, Deli Serdang, and Nias are in Cluster 2. The results can be briefly explained that the characteristic of Cluster 1 has a higher average temperature, lower humidity, and lower sunshine duration than cities in Cluster 2. However, Cluster 1 has a lower average minimum temperature than Cluster 2. The pairs of cities which have the most similarities are (Aceh, Sabang), (Pekanbaru, Padang Lawas), (Nagan Raya, Nias), (Jambi, Palembang), (Bengkulu, Tapanuli), and (Medan, Deli Serdang). The annual trend in several cities shows that there exists an increasing trend in minimum temperature, rising sunshine duration, and decreasing wind speed. These are signs of climate change that need a proper handling.
- Conference Article
12
- 10.1145/2660859.2660972
- Oct 10, 2014
Information Retrieval (IR) systems such as search engines retrieve a large set of documents, images and videos in response to a user query. Computational methods such as Automatic Text Summarization (ATS) reduce this information load enabling users to find information quickly without reading the original text. The challenges to ATS include both the time complexity and the accuracy of summarization. Our proposed Information Retrieval system consists of three different phases: Retrieval phase, Clustering phase and Summarization phase. In the Clustering phase, we extend the Potential-based Hierarchical Agglomerative (PHA) clustering method to a hybrid PHA-ClusteringGain-K-Means clustering approach. Our studies using the DUC 2002 dataset show an increase in both the efficiency and accuracy of clusters when compared to both the conventional Hierarchical Agglomerative Clustering (HAC) algorithm and PHA.
- Research Article
17
- 10.1111/j.1745-459x.2007.00121.x
- Jul 20, 2007
- Journal of Sensory Studies
Agglomerative hierarchical clustering was utilized to group consumers together based on product preference and liking in four consumer‐based sensory studies. This statistical technique was effective at determining variations in consumer preference as a result of both processing techniques and ingredient incorporation. Results revealed that agglomerative hierarchical clustering can often improve the interpretation of consumer sensory data when compared to currently utilized analyses, and has significant applications in research projects with a sensory component. Three recommendations for conducting a comprehensive statistical analysis of hedonic scaled consumer data are: (1) perform a randomized complete block design to test treatment effects using the total data set; (2) utilize agglomerative hierarchical clustering to group panelists based on preference and liking of food products; and (3) perform randomized complete block designs within each cluster. If significant differences occur among treatments within a cluster, use a mean separation technique to determine significant differences among treatments within that cluster. PRACTICAL APPLICATIONSAgglomerative hierarchical clustering was coupled with traditionally used analyses in the evaluation of hedonic scaled consumer data pertaining to chicken nuggets, retorted ham, fluid milk and cooked shrimp. Coupling of cluster analysis and traditional analyses was effective at grouping consumers together based on product preference and liking. Randomized complete block designs were also utilized within each cluster for further differentiation among treatments. A full description on how to analyze hedonic scaled sensory data using agglomerative hierarchical clustering, randomized complete block designs and Fisher's least significant difference test was included in this research paper and is an effective analytical method for the evaluation of hedonic scaled consumer data.
- Research Article
- 10.58805/kazutb.v.4.25-646
- Dec 31, 2024
- Вестник КазУТБ
One of the traditional methods for community detection in knowledge graphs is agglomerative clustering. Agglom-erative hierarchical clustering is a widely used type of hierarchical clustering for grouping objects based on their similarity. This method follows a bottom-up approach, beginning with each individual data point considered as an independent cluster, which are then continuously merged based on a similarity threshold between clusters. This paper focuses on the use of agglomerative clustering for analyzing skills extracted from job postings on an online recruitment platform. It describes the approach to data collection, processing, and subsequent clustering, providing an overview of linkage methods between clusters and examples of the application of various coefficients for quantitative assessment of cluster quality. An analysis of bilingual clusters in Russian and English is conducted, al-lowing for an evaluation of the versatility and adaptability of the proposed approach to analyzing the multilingual labor market in Kazakhstan. It was found that agglomerative clustering methods hold significant potential for identi-fying structured groups of skills, which can enhance the understanding of labor market trends and needs. The analysis of clusters formed in different languages confirmed the universality and adaptability of the proposed ap-proach to multilingual data.
- Research Article
- 10.36652/0869-4931-2025-79-4-185-192
- Jan 1, 2025
- Automation. Modern Techologies
А method of agglomerative hierarchical clustering of students based on their digital profiles, including skills, interests and academic achievements, is proposed. The key task of the method is processing categorical data presented in JSON format, followed by the use of the one-hotencoding method to convert the data into a numerical format. This allows us to correctly interpret various characteristics of students and use them for clustering. The steps for forming a distance matrix based on the Jaccard metric are considered. This metric takes into account similar features and ignores zero matches, which makes it especially suitable for working with categorical data. Agglomerative clustering was performed. The result was visualized using a dendrogram to determine the optimal number of clusters. Particular attention was paid to the analysis of intracluster and intercluster variation, which allows us to assess the homogeneity and differences between students in different clusters. The method can be used to automate the processes of personalization of training, the formation of balanced study groups and the creation of recommendation systems. Keywords clustering, digital profile, one-hot encoding, agglomerative hierarchical clustering, Jaccard metric, categorical data, personalization of education
- Conference Article
7
- 10.1145/3490148.3538584
- Jul 11, 2022
Hierarchical agglomerative clustering (HAC) is a popular algorithm for clustering data, but despite its importance, no dynamic algorithms for HAC with good theoretical guarantees exist. In this paper, we study dynamic HAC on edge-weighted graphs. As single-linkage HAC reduces to computing a minimum spanning forest (MSF), our first result is a parallel batch-dynamic algorithm for maintaining MSFs. On a batch of $k$ edge insertions or deletions, our batch-dynamic MSF algorithm runs in $O(k\log^6 n)$ expected amortized work and $O(\log^4 n)$ span with high probability. It is the first fully dynamic MSF algorithm handling batches of edge updates with polylogarithmic work per update and polylogarithmic span. Using our MSF algorithm, we obtain a parallel batch-dynamic algorithm that can answer queries about single-linkage graph HAC clusters. Our second result is that dynamic graph HAC is significantly harder for other common linkage functions. For example, assuming the strong exponential time hypothesis, dynamic graph HAC requires $\Omega(n^{1-o(1)})$ work per update or query on a graph with $n$ vertices for complete linkage, weighted average linkage, and average linkage. For complete linkage and weighted average linkage, the bound still holds even for incremental or decremental algorithms and even if we allow $\operatorname{poly}(n)$-approximation. For average linkage, the bound weakens to $\Omega(n^{1/2 - o(1)})$ for incremental and decremental algorithms, and the bounds still hold when allowing $n^{o(1)}$-approximation.
- Research Article
- 10.3760/cma.j.issn.1009-6906.2015.03.008
- Jun 28, 2015
Objective To explore the morbidity of hyperuricemia (HUA) and kidney injury (KI) among shipboard personnel of a certain naval unit during prolonged deployment at sea and analysis of related risk factors, so as to provide evidence for the development of related prevention and treatment measures. Methods Medical detections were made on 356 male shipboard personnel, who received physical check-ups from June to August, 2013 in the hospital. The physical check-up items included: serum uric acid (SUA), blood lipids (TC, TG, HDL and LDL), blood sugar(FPG, FINS), hepatic function(ALT, GGT, ALP, TP, ALB, TBILI and DBILI), renal function(Scr, BUN) and serum ferritin (SF). Data were analyzed by bivariate analysis and multivariate logistic regression models, to explore the correlation between HUA, KI and nutritional status and the above data. Results Results indicated that there were respectively 221 cases of HUA and KI, with a morbidity rate of 62% (221/356), and there were 166 cases that had both HUA and KI. The HUA patients with SUA levels between 417–467μmol/L had the highest morbidity. Bivariate analysis revealed that SUA was closely associated with the levels of TG, HDL, TP, TBILI, DBILI, BUN, eGFR and BG. Multivariate logistic regression further indicated SUA was uniquely related to TP and eGFR (P<0.05). Conclusions The HUA in the shipboard naval personnel seemed to have its epidemiological features, with the morbidity being much higher than that of the normal people, which might be associated with kidney injury. Measures should be taken for early prevention and timely treatment of the disorder clinically. Key words: Hyperuricemia; Kidney injury; Risk factors; Analysis of risk factors
- Research Article
62
- 10.1109/tasl.2008.2002085
- Nov 1, 2008
- IEEE Transactions on Audio, Speech, and Language Processing
Many current state-of-the-art speaker diarization systems exploit agglomerative hierarchical clustering (AHC) as their speaker clustering strategy, due to its simple processing structure and acceptable level of performance. However, AHC is known to suffer from performance robustness under data source variation. In this paper, we address this problem. We specifically focus on the issues associated with the widely used clustering stopping method based on Bayesian information criterion (BIC) and the merging-cluster selection scheme based on generalized likelihood ratio (GLR). First, we propose a novel alternative stopping method for AHC based on information change rate (ICR). Through experiments on several meeting corpora, the proposed method is demonstrated to be more robust to data source variation than the BIC-based one. The average improvement obtained in diarization error rate (DER) by this method is 8.76% (absolute) or 35.77% (relative). We also introduce a selective AHC (SAHC) in the paper, which first runs AHC with the ICR-based stopping method only on speech segments longer than 3 s and then classifies shorter speech segments into one of the clusters given by the initial AHC. This modified version of AHC is motivated by our previous analysis that the proportion of short speech turns (or segments) in a data source is a significant factor contributing to the robustness problem arising in the GLR-based merging-cluster selection scheme. The additional performance improvement obtained by SAHC is 3.45% (absolute) or 14.08% (relative) in terms of averaged DER.
- Book Chapter
5
- 10.1007/978-3-642-01510-6_72
- Jan 1, 2009
Fuzzy clustering has been proved successful in various fields in the recent past. In this paper, we introduce fuzzy clustering algorithms into the domain of automatic speaker clustering, and present a novel fuzzy-based hierarchical speaker clustering algorithm by applying fuzzy theory into the state-of-the-art agglomerative hierarchical clustering. This method follows a bottom-up strategy, and determines the fuzzy memberships according to a membership propagation strategy, which propagates fuzzy memberships in the iterative process of hierarchical clustering. Further analysis reveals that this method is an extension of conventional hierarchical clustering algorithm. Experiment results show that our method exhibits quite competitive performances compared to conventional k-means, fuzzy c-means and agglomerative hierarchical clustering algorithms.KeywordsSpeaker clusteringGLRK-meansAHCFuzzy c-meansFuzzy hierarchical clustering
- Research Article
- 10.1016/j.envadv.2025.100676
- Nov 1, 2025
- Environmental Advances
- Research Article
1
- 10.1016/j.envadv.2025.100646
- Oct 1, 2025
- Environmental Advances
- Research Article
- 10.1016/j.envadv.2025.100649
- Oct 1, 2025
- Environmental Advances
- Research Article
- 10.1016/j.envadv.2025.100654
- Oct 1, 2025
- Environmental Advances
- Research Article
- 10.1016/j.envadv.2025.100664
- Oct 1, 2025
- Environmental Advances
- Research Article
- 10.1016/j.envadv.2025.100661
- Oct 1, 2025
- Environmental Advances
- Research Article
- 10.1016/j.envadv.2025.100674
- Oct 1, 2025
- Environmental Advances
- Research Article
- 10.1016/j.envadv.2025.100656
- Oct 1, 2025
- Environmental Advances
- Research Article
- 10.1016/j.envadv.2025.100655
- Oct 1, 2025
- Environmental Advances
- Research Article
- 10.1016/j.envadv.2025.100644
- Oct 1, 2025
- Environmental Advances
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.