Articles published on K-means Clustering
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
21848 Search results
Sort by Recency
- New
- Research Article
- 10.1186/s12933-026-03096-1
- Feb 8, 2026
- Cardiovascular diabetology
- Yan Wang + 5 more
The estimated glucose disposal rate (eGDR), an established measure of peripheral insulin sensitivity, contributes to stratifying the risk of cardio-cerebrovascular events. Nevertheless, the association between long-term eGDR exposure and stroke incidence throughout all stages (0-4) of cardiovascular-kidney-metabolic (CKM) syndrome remains unknown. A cohort of 5248 individuals was drawn from the China Health and Retirement Longitudinal Study (CHARLS). For each participant, eGDR values for the years 2012 and 2015 were calculated using the equation: 21.158 - [0.090 × WC (cm)] - [3.407 × HTN (presence = 1)] - [0.551 × HbA1c (%)]. Cumulative eGDR was calculated as (eGDR2012 + eGDR2015)/2* time (2015-2012). K-means clustering was used to analyse eGDR values from both 2012 and 2015 to identify distinct change patterns. To assess associations with stroke risk, we utilised multivariable logistic regression and restricted cubic spline models. During the 2015-2018 follow-up period, a total of 336 incident stroke cases were documented. Four distinct eGDR change patterns were identified. In fully adjusted models, compared with the participants in the persistent low pattern (Class 2), those in the moderate-high stable (OR 0.43, 95% CI: 0.31-0.58), stable high (OR 0.29, 0.19-0.43), and rapid decrease (OR 0.66, 0.47-0.91) patterns exhibited significantly lower stroke risk. Furthermore, each 1-unit increase in cumulative eGDR was associated with a 5% reduction in stroke odds (OR 0.95, 0.93-0.96). Restricted cubic spline analysis confirmed a linear inverse relationship between cumulative eGDR and stroke risk (P < 0.001; P for nonlinearity = 0.259). Cumulative eGDR is inversely associated with stroke risk across all CKM syndrome stages (0-4). This observation suggests that prolonged eGDR surveillance may be associated with improved risk stratification in this population.
- New
- Research Article
- 10.3390/pr14030578
- Feb 6, 2026
- Processes
- Simona Gavrilaș
Sustainable sources of natural antioxidants are increasingly important for circular bioeconomy strategies. Plant-derived waste streams represent an underexploited resource with significant potential for recovery of high-value antioxidant compounds such as carotenoids, polyphenols, and resveratrol. This review assesses potential alternative biomass sources, including nonhazardous wastes from agriculture, forestry, and fishing, as well as those from the manufacture of food products, beverages, and tobacco products. It evaluates their valorization potential using statistical evidence at the European level. EUROSTAT datasets were analyzed using XLSTAT 2025.2.0 through correlation analysis, Principal Component Analysis (PCA), Agglomerative Hierarchical Clustering (AHC), and k-means clustering. Variables included fresh vegetable production, plant waste generation, processed waste volumes, and national research and development expenditures and innovation. Correlation analysis revealed a strong association between total processed waste and research and development investments (r = 0.87), suggesting that technological capacity influences waste valorization. A moderate correlation (r = 0.55) between nonhazardous waste and processed quantities supports the operational feasibility of extracting antioxidants from residual biomass. PCA showed that Factor 1 (50.16% variance) is dominated by waste generation and processing capacity, whereas organic agriculture loads primarily on Factor 2 (21.6%). Cluster analyses grouped European countries by bioresource management efficiency, highlighting substantial heterogeneity in their readiness for valorization. The combined statistical evidence supports the use of plant-based waste streams as viable, sustainable feedstocks for antioxidant recovery. Strengthening processing infrastructure, harmonizing data reporting, and accelerating research and development investments are essential steps for integrating antioxidant extraction into circular bioeconomic processes.
- New
- Research Article
- 10.55041/ijsrem56362
- Feb 5, 2026
- International Journal of Scientific Research in Engineering and Management
- Vishnu S.Nair + 1 more
Abstract Exploratory Data Analysis (EDA) is a fundamental step in data-driven research, enabling analysts to understand data structure, identify patterns, and detect anomalies. However, conventional EDA techniques are largely manual, time-intensive, and heavily dependent on domain expertise, often resulting in high cognitive load, subjective bias, and limited scalability when dealing with complex or high-dimensional datasets. To address these limitations, this paper presents Explainable AI-EDA, an intelligent and automated exploratory data analysis framework that integrates statistical analysis, machine learning, and explainable artificial intelligence into a unified system. The proposed framework performs automated data profiling, missing value analysis, skewness and kurtosis evaluation, and entropy-based dataset complexity assessment. Machine learning techniques such as K-Means clustering, Isolation Forest-based anomaly detection, and linear regression are employed to uncover hidden patterns, detect outliers, and analyze variable relationships. The system is implemented as an interactive web-based application that supports real-time visualization, natural language interaction through an AI research assistant, and automated generation of research-ready analytical reports. Experimental evaluation demonstrates that Explainable AI-EDA significantly reduces analyst cognitive load, improves analytical efficiency, and provides scalable and reproducible exploratory analysis. Keywords:- Explainable AI (XAI), Automated EDA, Large Language Models, Information Entropy, Machine Learning, Data Visualization, Isolation Forest, K-Means Clustering, Cognitive Load, Statistical Profiling..
- New
- Research Article
- 10.1016/j.jenvman.2026.128815
- Feb 5, 2026
- Journal of environmental management
- Ruijing Qiao + 8 more
Integrating ecosystem service flows into zoning for the management of ecological risks: A case study of the Pinglu Canal Watershed.
- New
- Research Article
- 10.1186/s12889-026-26478-2
- Feb 5, 2026
- BMC public health
- Pheerasak Assavanopakun + 3 more
Environmental issues related to air pollution in Southeast Asia have persisted for more than a decade, especially in Thailand. This study aims to estimate the treatment costs of respiratory diseases caused by exposure to ambient PM₂.₅ and to identify the factors that influence these costs. This retrospective study analyzed secondary data on OPD and IPD respiratory disease treatment costs from government hospitals, along with ambient PM₂.₅ data from low-cost monitoring stations, to estimate the cost of illness across 25 districts in Chiang Mai during Thailand's fiscal year 2023. Economic cost was estimated using the Cost-of-Illness method formula: Economic Cost Loss = Health Impact × Treatment Cost. K-means cluster analysis was used to classify estimated costs into minimum, medium, and maximum cost scenarios. Multiple linear regression was applied to identify significantly associated factors with treatment cost. Under the maximum cost scenario identified through K-means cluster analysis stratification, the total treatment cost associated with an average PM₂.₅ concentration of 42.59µg/m³ was 460,122.58 USD, averaging 41.62 USD per case. Each 1µg/m³ increase in PM2.5 was associated with a cost rise ranging from 403.84 to 13,159.87 USD. Non-infectious respiratory diseases incurred costs approximately two times higher than infectious ones. The estimate of maximum treatment burden for respiratory disease cases was highest in urban areas, totaling 102,878.88 USD. The urban area showed a significantly higher cost of treatment both in OPD and IPD cases (p < 0.001). Moreover, higher healthcare levels and older age were associated with higher costs in OPD cases. In IPD cases, length of hospital stay was a significant predictor. Ambient PM₂.₅ exposure contributes significantly to the economic burden of respiratory diseases in polluted areas. These highlight the importance of pollution control policies and healthcare resource planning in high-risk areas. not applicable.
- New
- Research Article
- 10.3390/computation14020041
- Feb 2, 2026
- Computation
- Olympia Roeva + 6 more
A representative cluster-based model of the batch process of ethanol production by Kluyveromyces sp. is proposed. Experimental data from fermentation processes of 17 different strains of K. marxianus are used; each of them potentially exhibits different metabolic and kinetic behavior. Three algorithms for clustering are applied. Two modifications of Principal Component Analysis (PCA)—hierarchical clustering and k-means clustering; and InterCriteria Analysis (ICrA) are used to simplify a large dataset into a smaller set while preserving as much information as possible. The experimental data are organized into two main clusters. As a result, the most representative fermentation processes are identified. For each of the fermentation processes in the clusters, structural and parameter identification are performed. Four different structures describing the specific substrate (glucose) consumption rate are applied. The best structure is used to derive the representative model using the data from the first cluster. Verification of the derived model is performed using experimental data of the second cluster. Model parameter identification is performed by applying an evolutionary optimization algorithm.
- New
- Research Article
- 10.1016/j.gerinurse.2026.103883
- Feb 2, 2026
- Geriatric nursing (New York, N.Y.)
- Jia Liu + 7 more
Tailored communication strategies according to patient characteristics for older adults with benign prostatic hyperplasia: A cross-sectional study.
- New
- Research Article
- 10.1200/go-25-00531
- Feb 1, 2026
- JCO global oncology
- Daniel F Pilco-Janeta + 5 more
To characterize gastric cancer epidemiology in Latin America and the Caribbean, identify country-level predictors of the mortality-to-incidence ratio (MIR), and describe the clinical research landscape with emphasis on precision oncology (PO). We conducted a retrospective, country-level study integrating GLOBOCAN 2022 incidence and mortality data, ClinicalTrials.gov records (2004-2025), and socioeconomic indicators (United Nations Development Program Human Development Index [HDI] 2023 and current health expenditure). MIR was calculated per country. Precision-oncology studies were flagged by a curated drug dictionary applied to the Interventions field; country involvement was measured as country-study participations. Analyses included geospatial mapping, Spearman correlation, ordinary least squares regression, K-Means clustering (k = 3), and a Random Forest classifier for feature ranking and discrimination. Across 24 countries, incidence ranged from 3.97 to 14.31 per 100,000 and mortality from 2.98 to 11.06 per 100,000. MIR was highest in Honduras (0.93), Belize (0.89), and Guatemala (0.88) and lowest in Cuba (0.65), Uruguay (0.66), and Costa Rica (0.68). The HDI correlated inversely with MIR (ρ = -0.71, P < .001); the association with number of trials was weak (ρ = -0.09). Three regional archetypes were identified. The Random Forest model achieved an AUC of 0.94 and ranked HDI as the top predictor. Of the 105 studies, 81 were interventional; phase III accounted for 40.7% and phase II for 30.9%. Country-study participations were concentrated in Brazil (23.4%), Chile (19.1%), and Argentina (15.2%). In PO, participation was dominated by Brazil, Chile, Argentina, and Mexico (72.2% of 140 participations), mostly involving trastuzumab, pembrolizumab, ramucirumab, and nivolumab. Socioeconomic context was more associated with outcomes than research volume. Regional research remains concentrated and drug-limited, supporting policies to strengthen diagnostics, access, and equitable clinical investigation.
- New
- Research Article
- 10.1007/s00122-026-05157-1
- Feb 1, 2026
- TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik
- Xiangwei Hu + 9 more
Drought is the primary factor contributing to crop yield loss. Therefore, enhancing the drought tolerance of foxtail millet, a globally significant food crop, is essential for ensuring global food security. We analyzed 425 foxtail millet samples from the Xinjiang Academy of Agricultural Sciences using 1,304,248 highly polymorphic SNPs for a genome-wide association study, and a total of 77 QTL regions were detected across three environments. Linkage disequilibrium (LD) analysis, population genetic structure analysis, K-means clustering, and phylogenetic tree construction revealed that foxtail millet in different subgroups exhibited certain regional differences. The secondary screening of QTL region genes combined with transcriptome analysis identified six genes with significant expression differences. These drought-responsive genes in foxtail millet function as protein kinases, glycosyltransferases, CTP synthetases, and transcription factors. Haplotype analysis identified 8 phenotypically distinct haplotypes in candidate genes associated with drought stress. Expression levels of genes associated with drought tolerance and yield, validated by RT-qPCR, were largely consistent with transcriptome analysis results. This study's results offer a scientifically significant reference for genetic research and improvement in foxtail millet yield under drought stress.
- New
- Research Article
- 10.1200/cci-25-00126
- Feb 1, 2026
- JCO clinical cancer informatics
- Amy Trentham-Dietz + 8 more
The University of Wisconsin Population Health Institute (PHI) Model of Health, grounded in models developed over a decade ago, provides a framework for prioritizing health-related investments including setting agendas, implementing policies, and sharing resources for improving community health and health equity. The model includes multiple determinants of health and two broad health outcomes (length and quality of life). We adapted the PHI Model of Health to cancer outcomes. Using county-level publicly available data, health factor summary measures were derived in three areas: health infrastructure including health promotion and clinical care, physical environment, and social and economic factors. A composite health factor z-score was calculated as the weighted (40%, 15%, and 45%, respectively) average of the summary measures for each county, and k-means clustering was used to create unequally sized county groups with lower (healthier) to higher (less healthy) z-scores. We fit age-adjusted negative binomial regression models to estimate rate ratios and 95% CI for cancer mortality in relation to county health factor cluster. Age-adjusted cancer mortality rates increased across the 10 county health factor clusters for all-cancers as well as for lung, colorectal, breast, and prostate cancers. Rate ratios generally increased across the 10 health factor clusters for all cancers combined and for specific cancer types. Compared with counties with the most favorable health factor conditions, the counties with the least favorable conditions had an all-cancer mortality rate ratio of 1.49 (95% CI, 1.39 to 1.60). The PHI model of health adapted to cancer outcomes provides an approach for linking community-specific conditions to the interventions that hold promise to directly address drivers of the cancer burden.
- New
- Research Article
- 10.1016/j.gaitpost.2025.110026
- Feb 1, 2026
- Gait & posture
- Hwa-Ik Yoo + 3 more
Subgrouping non-specific low back pain based on spinal marker trajectory data: An unsupervised machine learning approach.
- New
- Research Article
- 10.1016/j.jad.2025.120718
- Feb 1, 2026
- Journal of affective disorders
- Yu-Ru Su + 4 more
Revisiting drunk driving risk among individuals with alcohol use disorder using unsupervised learning: From clinical characteristics and neuropsychological performance to EEG data.
- New
- Research Article
- 10.1016/j.jenvman.2025.128180
- Feb 1, 2026
- Journal of environmental management
- Jhuliet Katalina Guerrero-Peñarete + 2 more
A multidimensional framework for assessing productive and ecosystem potential of wild animal species: insights from Latin America.
- New
- Research Article
- 10.1049/icp.2025.4694
- Feb 1, 2026
- IET Conference Proceedings
- Jia Harisinghani + 3 more
Automated slum classification using deep convolutional neural networks and K-means clustering: a comprehensive Mumbai metropolitan region analysis
- New
- Research Article
- 10.1080/13548506.2026.2622636
- Feb 1, 2026
- Psychology, Health & Medicine
- Hedvig Kiss + 2 more
ABSTRACT eHealth literacy refers to one’s ability to engage effectively with electronic health information. It is proven to have associations with certain psychological constructs; therefore, the purpose of this study was to identify cluster profiles based on differences in the context of levels of eHealth literacy, illness perception, well-being, stigmatization, optimism, and self-efficacy. In a cross-sectional design, a sample of adult hematology patients from Hungary (N = 96; Mage = 56.5 years; SD = 15.5) completed a self-administered paper-pencil survey including six scales: eHealth Literacy Scale, Brief Illness Perception Questionnaire, WHO Well-Being Index, Stigma Scale for Chronic Illness, Revised Life Orientation Test, and General Self-Efficacy Scale. Pearson’s bivariate correlation analyses explored bivariate relationships between eHealth literacy and other psychological variables, while K-means clustering was performed to identify patient categorization across the explored variables. Correlation analysis revealed that eHealth literacy had a positive correlation with self-efficacy and a negative association with illness perception. Well-being was positively correlated with self-efficacy and optimism, while illness perception was negatively correlated with self-efficacy but positively with optimism. Cluster analysis identified two patient profiles. Cluster 1, labeled ‘empowered and e-health literate patients’, included 42 patients with high-level eHealth literacy, better well-being, positive illness perceptions, higher scores on self-efficacy and optimism, and weaker feelings of stigmatization. Cluster 2, labeled ‘vulnerable patients with low-level eHealth literacy’, comprised 53 patients with low-level eHealth literacy, poorer well-being, negative illness perceptions, lower level of self-efficacy and optimism, and stronger feelings of stigmatization. Chi-square tests revealed statistically significant differences by clusters regarding age, permanent residence, and health status. In conclusion, findings showed substantial differences in patient profiles, suggesting that in their development, eHealth literacy and its associations with psychological variables, most importantly, well-being and illness perception can play a decisive role. These results promote the targeted development of eHealth literacy interventions.
- New
- Research Article
- 10.28991/esj-2026-010-01-010
- Feb 1, 2026
- Emerging Science Journal
- Tho M Nguyen + 3 more
Understanding the key drivers of greenhouse gas (GHG) emissions is crucial for designing effective and adaptable climate policies, particularly given the complex interplay among structural, institutional, and energy-related factors. This study examines the time-varying impacts of key determinants of GHG emissions across 29 countries from 1993 to 2018, with an emphasis on the shadow economy, energy security risks, and geopolitical volatility. The analysis follows a four-step framework: countries are classified using principal component analysis (PCA) and K-means clustering, robust covariates are selected via Bayesian Model Averaging (BMA), and their impacts are estimated with time-varying coefficient panel models. Model robustness is evaluated through grouped cross-validation, confirming the superior performance of the time-varying random effects (tvRE) specification. The results reveal that the shadow economy and energy security risk exert more dynamic and substantial impacts in the Higher-income group, while their effects are comparatively muted in the Lower-income group. Geopolitical risk, however, shows limited explanatory power for emissions in both contexts. This study provides a novel empirical framework for capturing the dynamic influences of emissions drivers and contributes actionable insights toward achieving sustainable development goals.
- New
- Research Article
- 10.1007/s11069-025-07772-5
- Feb 1, 2026
- Natural Hazards
- İrem Karakaya + 3 more
Abstract Floods are among the most devastating natural disasters worldwide, necessitating effective disaster management strategies to mitigate their impacts. This study focuses on the identification of optimal safe assembly areas in the case of urban flash floods using Geographic Information System (GIS) and machine learning techniques. The Bartın River Basin, which has a history of severe flood events, was selected as the study area. A total of 79 micro-basins covering 2,342.87 km 2 were analyzed using 17 parameters related to topography, hydrology, infrastructure, and demography. After normalization, Principal Component Analysis (PCA) was applied to reduce dimensionality from 17 to 2–16 components. Seven clustering algorithms (K-Means, Agglomerative Hierarchical, DBSCAN, MeanShift, Birch, Mini Batch K-Means, and Spectral) were tested, and their performances were compared using the Silhouette Score Index (SSI). Results indicate that the K-Means algorithm with 2 principal components and 3 clusters achieved the best performance (SSI > 0.5), identifying micro-basins 9, 34, and 41 as the most suitable assembly areas. Post-clustering validation revealed that these areas combine low flood risk indicators with high accessibility. More than 85% of the basin’s 206,715 inhabitants can reach a safe assembly point within 30 min (≤ 30 km at an average evacuation speed of 30 km/h). Notably, micro-basin 9 alone provides access for 68.3% of the population within 5–15 km, highlighting its strategic importance. Historical flood data (2020–2024) further confirmed that two of the identified basins are located in zones with fewer past flood events, reinforcing their reliability. The proposed framework bridges theoretical optimization and real-world feasibility, providing actionable insights for disaster planners. Future research will focus on large-scale evacuation simulations and the integration of population flows and shelter capacities to further strengthen operational applicability.
- New
- Research Article
- 10.1016/j.compbiolchem.2025.108778
- Feb 1, 2026
- Computational biology and chemistry
- Yiheng Du + 3 more
Ligand-based prediction of anti-bacterial compounds: Overcoming class imbalance in molecular data.
- New
- Research Article
- 10.46481/jnsps.2026.2929
- Feb 1, 2026
- Journal of the Nigerian Society of Physical Sciences
- Catherine + 4 more
This study presents a novel hybrid knowledge discovery model integrating K-Means clustering, Naive Bayes classification, and Knowledge Graph technology to address interpretability and data heterogeneity challenges in precision agriculture. The proposed framework first applies K-Means to segment agro-ecological zones using multi-source data (soil, climate, satellite imagery), then employs Naive Bayes to classify crop productivity tiers, achieving 89% accuracy—surpassing standalone benchmarks (Naive Bayes: 86%, Random Forest: 87.5%). A Neo4j-based Knowledge Graph contextualizes these outputs, demonstrating 95% schema completeness and efficient querying (0.1559s latency), while enabling dynamic analysis of soil-climate-crop relationships. Pilot trials confirmed actionable impacts, including 22% reduced water use and 18% less fertilizer waste in targeted farms. By unifying unsupervised/supervised learning with semantic reasoning, this work advances scalable, interpretable decision support systems for sustainable agriculture, offering a replicable template for global food security initiatives.
- New
- Research Article
- 10.1016/j.cam.2025.116921
- Feb 1, 2026
- Journal of Computational and Applied Mathematics
- Shijie Zhao + 3 more
Optimizing cluster centroids with improved quadratic interpolation: an Adaptive K-means algorithm