Articles published on models-for-clustering
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
5022 Search results
Sort by Recency
- Research Article
- 10.1111/dom.70448
- Jan 7, 2026
- Diabetes, Obesity & Metabolism
- Junjie Wu + 3 more
AimsMASLD, defined as a steatotic liver disease in the presence of one or more cardiometabolic risk factors and the absence of harmful alcohol intake, exhibits substantial heterogeneity complicating risk stratification. A prior clustering model proposed liver‐specific and cardiometabolic subtypes, yet its generalizability and prognostic relevance remain unclear in the broader population. We aim to validate and replicate the prior approach in a nationally representative U.S. cohort.Materials and methodsWe included 3300 participants with MASLD from NHANES III. For validation, participants were assigned to previously defined clusters using published medoids. For replication, de novo clusters were derived using the Partitioning Around Medoids algorithm. Cox proportional hazards models accounting for the complex survey design of NHANES III were used to estimate hazard ratios for all‐cause, cardiovascular‐related, and diabetes‐related mortality across clusters.ResultsIndividual cluster assignments showed limited reproducibility between the validation and replication analyses, although the overall clustering pattern was preserved. Based on clinical profiles, clusters were categorized into cardiometabolic, liver‐specific, and other subtypes. The cardiometabolic cluster consistently showed higher risks in both analyses, while the liver‐specific cluster showed no significant associations.ConclusionsThe subtyping model demonstrated limited generalizability. Nonetheless, the consistent identification of broad cardiometabolic and liver‐specific patterns suggests potential value for risk stratification, pending further validation.
- Research Article
- 10.3390/jtaer21010024
- Jan 6, 2026
- Journal of Theoretical and Applied Electronic Commerce Research
- Yi Wang + 1 more
Visual social commerce platforms now mediate much of brand communication and conversion, yet managers still lack clear guidance on how brands and creators should technically design posts that consistently achieve high user engagement under budget and platform constraints. Prior research explains why users engage with brands online, but it mainly focuses on individual motives and message features and largely treats the brand–creator–platform relationship and the post-design process as a black box. Drawing on the Technology Affordance Actualization (TAA) framework—which conceptualizes how platform-provided action possibilities (affordances) are selectively enacted through user practices—we develop a Creator–Content–Timing (CCT) perspective on how brands and creators actualize visibility, interactivity, and commercial collaboration affordances into user engagement outcomes. We analyze 138,713 image–text posts from 100 beauty brands on Xiaohongshu using machine learning, text mining, computer vision, and regression and clustering models. The results show that creator tier, brand status, sponsorship, content cues, and posting time have systematic effects on both engagement intensity and a cost-normalized metric, Int_per_cost (interactions per 1000 CNY of estimated advertising cost). Smaller creators and non-sponsored posts achieve higher engagement per impression and higher Int_per_cost than top-tier creators and sponsored posts; moderate text length, non-exclusive brand mentions, human faces, and specific temporal windows are also associated with superior outcomes. The study extends TAA to a creator–brand–platform context by operationalizing affordance actualization as observable CCT configurations at the post level and provides configuration-level guidance on how brands can align creator selection, content design, and scheduling to improve engagement on visual social commerce platforms.
- Research Article
- 10.1007/s00357-025-09529-y
- Jan 6, 2026
- Journal of Classification
- Nicola Piras + 2 more
Abstract We propose a latent class model for ordinal data with CUB (combination of discrete uniform and shifted binomial) distributions in the case of multilevel structures of the data. The CUB model is a powerful approach to the analysis of ordinal data, where the elicitation process is thought to be governed by a feeling parameter and an uncertainty parameter. Ordinal data are common across different research fields and may present a multilevel structure with units nested within groups. The model we present extends the framework of multivariate CUB models for model-based clustering to multilevel data, either hierarchical or cross-classified. Numerical experiments on simulated data highlight the added value of assuming a CUB model to account for ordinal information; the procedure’s interest is also shown through a real data application.
- Research Article
- 10.7717/peerj-cs.3431
- Jan 5, 2026
- PeerJ Computer Science
- Tri Wahyuningsih + 2 more
Automated evaluation of argumentative writing has emerged as an essential field in educational technology, providing systematic feedback on student essays to improve critical thinking and writing quality. This study examines two complementary approaches: the PERSUADE model (Personalized Evaluations and Recommendations for Students Using Argumentation Data and Evidence), which relies on annotated discourse elements, and a multi-document clustering (MDC) model implemented with PyTorch and transformer embeddings. The MDC model, when implemented with Bidirectional Encoder Representations from Transformers (BERT)-base, achieved 78 percent accuracy and clustering metrics indicating strong separation (Silhouette Score 0.789, Davies–Bouldin Index 0.304, Calinski–Harabasz Score 3,529.2). To test the impact of richer embeddings, the MDC framework was extended with Robustly Optimized BERT Pretraining Approach (RoBERTa) and Decoding-enhanced BERT with disentangled attention (DeBERTa). The results show consistent performance improvements: RoBERTa reached 89 percent accuracy with higher clustering stability, while DeBERTa-v3-large achieved the strongest performance at 91 percent accuracy, with the best clustering metrics (Silhouette Score 0.817, Davies–Bouldin Index 0.276, Calinski–Harabasz Score 3,788.5). These findings confirm that the choice of encoder significantly influences clustering coherence and classification effectiveness. The comparative analysis highlights complementary strengths: PERSUADE excels in micro-level discourse evaluation, while MDC, enhanced by advanced embeddings, offers macro-level organization of argumentative structures. Together, these approaches demonstrate the potential for integrated frameworks that capture both discourse depth and cross-document clustering.
- Research Article
- 10.1021/acs.jchemed.5c00723
- Jan 5, 2026
- Journal of Chemical Education
- Danila Shiryaev + 4 more
Active learning through interactive exploration significantly enhances student engagement and understanding of chemistry. This educational activity demonstrates Principal Component Analysis (PCA) and Partial-Least-Square-Discriminant Analysis (PLS-DA), two foundational machine learning techniques widely applied in contemporary research. Interactive Python-based programs offer accessible educational platforms for students exploring chemical data, requiring no prior programming experience. This application allows learners to actively engage in feature exploration and dimensionality reduction processes, applied to clustering and classifying binary AB equiatomic solid state compounds. Students can actively select and modify chemical and physical features, observing in real time how these choices impact the effectiveness of the PCA and PLS-DA clustering models. Initially, PCA enables unsupervised visualization of natural clustering and correlations among compounds without prior labeling. Subsequently, by employing PLS-DA, students develop supervised models capable of predicting crystal structures, explicitly illustrating supervised versus unsupervised learning paradigms. The activity highlights the importance of explainability in machine learning models rather than operating the models as a ″black box″. Beyond learning fundamental concepts, the activity encourages students to participate in genuine exploratory processes, mirroring the investigative approaches historically utilized by researchers and practiced today. By experimenting freely with data sets and computational methods, students experience firsthand the iterative nature of scientific discovery, fostering deeper insight into both chemical informatics and the broader research methodology.
- Research Article
- 10.1007/s40430-025-06061-3
- Jan 2, 2026
- Journal of the Brazilian Society of Mechanical Sciences and Engineering
- Marina Elizabeth Mazuroski + 6 more
CFD–DEM modeling of particle clustering and deposition combining growth and adhesion
- Research Article
- 10.1080/14697688.2025.2599892
- Jan 2, 2026
- Quantitative Finance
- Chang Wang + 2 more
Modern portfolio theory (MPT), proposed by Harry Markowitz, remains central to portfolio optimization, but its reliance on traditional covariance estimators limits its effectiveness in high-dimensional settings. This paper introduces a dynamic estimation with hierarchical clustering (DEH) model that combines rolling sample covariances with clustering-based structure to improve estimation stability and responsiveness. Using a panel of the 200 most liquid ETFs, we evaluate DEH across varying lookback lengths, rebalancing frequencies, and portfolio sizes. DEH consistently delivers stronger risk-adjusted performance, under monthly, quarterly and yearly rebalancing. In very large portfolios, DEH is still broadly competitive, but its performance becomes more dependent on the interaction between lookback length and rebalancing horizon, so its advantages are less clear-cut. Accordingly, explicit transaction-cost modelling and richer clustering specifications remain important directions for future research. These findings highlight DEH as a practical and interpretable tool for dynamic portfolio optimization in volatile and high-dimensional markets by effectively combining machine learning techniques with financial insights.
- Research Article
- 10.1049/cds2/8895067
- Jan 1, 2026
- IET Circuits, Devices & Systems
- Tianyu Zhang
Aiming at the modeling problem of bipolar distributed photovoltaic (DPV) cluster, this paper proposes a clustering equivalent modeling method based on clustering algorithm. First, by analyzing the detailed model of bipolar DPV, it is found that the indexes that can reflect its steady‐state and dynamic characteristics mainly include energy storage element parameters such as inductance and capacitance and PI control parameters. Then, combined with the five commonly used clustering algorithms, the clusters composed of ten DPVs are clustered and grouped. Finally, the dynamic simplified model of DPV cluster is obtained by parameter aggregation and model equivalence of DPV in the same group. The above analysis is simulated and verified on the IEEE33 node system containing 10 DPVs, and the traditional single‐machine equivalent model and double‐machine equivalent model and clustering model are added for comparative analysis. The simulation results show that the clustering equivalent model can correctly reflect the dynamic response characteristics of DPV clusters under different working conditions. The error between each clustering model and the detailed model is not more than 10%. Among them, the fuzzy C ‐mean (FCM) clustering model has the best effect, the minimum error is 0.11%, and the maximum error of the single machine equivalent model is 9.3%.
- Research Article
- 10.1109/mcom.001.2500361
- Jan 1, 2026
- IEEE Communications Magazine
- Meng Yuan + 4 more
As AI-Generated Content (AIGC) transforms digital experiences, the demand for training and deploying multiple Generative AI (GAI) models at scale is rapidly growing. This demand is pushing AI training clusters to transition from dedicating resources to one foundation model toward concurrently hosting heterogeneous models, including both general- purpose and domain-specific variants, within the same infrastructure. This shift from single-model to multi-model operations reshapes AI system architecture and introduces new challenges in resource scheduling and orchestration. Clusters need to manage heterogeneous and dynamic demands across compute, memory, and networking, making efficient scheduling a challenge. In this article, we highlight the emerging paradigm of operating multiple GAI models in large-scale clusters. We analyze key trends driving this change, identify core challenges in scheduling and resource management, and explore promising solutions. Specifically, we propose topology-aware scheduling and AI-driven optimization techniques to improve resource utilization and training efficiency in GAI workloads. Finally, we outline future research directions, including secure multi-tenant scheduling, model-aware orchestration, and sustainability-focused cluster management.
- Research Article
- 10.1109/tccn.2025.3576824
- Jan 1, 2026
- IEEE Transactions on Cognitive Communications and Networking
- Jiabao Wang + 6 more
In wireless communication networks, the abundance of electromagnetic devices poses a significant challenge to the security of electromagnetic space. The emergence of automatic modulation classification (AMC) has provided significant support for electromagnetic spectrum management. However, varying environments can lead to alterations in signal distributions, the potential presence of unknown modulation signals and number of types also bring hurdles for regulation. In response, this paper proposes a Transformer-based Generative Adversarial Network (TA-ClusterGAN) incorporating Infinite Gaussian Mixture Model for AMC (TIGM-AMC) algorithm, capable of clustering signals under the conditions of unknown label and number of types. We innovatively introduce the Adversarial Chinese Restaurant Process (ACPR) to infer the number of modulation categories in AMC, leverage contour stellar image (CSI) technology to transform signals into graph domain and enhance clustering effectiveness through attention mechanism. Under the condition that the number of modulation classes is unknown, the model undergoes pre-training first, then we map the category vectors output by the encoder into an infinite gaussian space. Results show that TA-ClusterGAN can achieve 98.71% NMI when the number of modulation classes is known. And we can accurately infer the number of modulation types through TIGM-AMC when the number of categories is unknown, which approach can enhance the adaptability of AMC in unknown environments.
- Research Article
2
- 10.1109/tifs.2026.3671087
- Jan 1, 2026
- IEEE Transactions on Information Forensics and Security
- Jinjia Peng + 4 more
The task of unsupervised visible–infrared person re-identification (USL-VI-ReID) aims to retrieve cross-modal pedestrian images without manual annotations. The key challenge lies in achieving semantic alignment to resolve modality bias in the absence of real labels. However, existing methods overly rely on single-modal information in the process of pseudo-label generation without considering cross-modal associations, making it difficult to bridge the modality gap between visible and infrared images. To address these issues, this paper proposes a Bi-level Inter-Modal Modulation Network (BIMM-Net), which employs multi-level cluster structure optimization as a core strategy to drive the establishment of cross-modal semantic associations, ultimately achieving cross-modal alignment at the feature representation level. Specifically, we construct a novel intermediary modality GrayMix from visible images to enhance model robustness against color variations and alleviate modality gaps. To filter out noise in cross-modal matching and establish a shared semantic space between visible and infrared modalities, we further develop a Ternary Pairs Calibration-Convergence module designed for filtering noise from visible-infrared cluster matching, on this basis constructing fused mixture clusters. Building on this mixture cluster space, an Heterogeneous-Isomorphic Alignment Loss is also designed to align the feature distributions of the three modalities, reinforcing cross-modal semantic consistency. In addition, we present a Cross-modal Neighborhood Consistency Clustering method, which facilitates the formation and propagation of cross-modal clusters by selecting high-confidence cross-modal neighbor pairs and refining feature distances. Ultimately, BIMM-Net through the joint modeling of bi-level clustering enables multiple levels to guide each other in refining cross-modal structures, thereby effectively establishing the semantic associations between visible and infrared modalities. Extensive experiments validate the superior performance of the proposed framework, achieving state-of-the-art results in USL-VI-ReID. The source code of this paper is available at: https://github.com/liujuny5920/DIMM-Net.
- Research Article
- 10.1016/j.autcon.2025.106580
- Jan 1, 2026
- Automation in Construction
- Penglu Chen + 4 more
Multi-modal vision-driven point cloud registration for efficient fusion of multi-source models in regional building clusters
- Research Article
1
- 10.1016/j.future.2025.107929
- Jan 1, 2026
- Future Generation Computer Systems
- Lincheng Han + 4 more
A novel nonparametric Bayesian model for time series clustering: Application to electricity load profile characterization
- Research Article
3
- 10.1109/tccn.2025.3579527
- Jan 1, 2026
- IEEE Transactions on Cognitive Communications and Networking
- Lei Cheng + 6 more
The significant enhancement of satellite onboard processing capability and drastic proliferation of Artificial Intelligence (AI) applications have fostered decentralized satellite federated learning (DSFL), a transformative paradigm that exchanges and aggregates machine learning (ML) models in satellite clusters for collaborative learning. However, the limited model exchange opportunities caused by intermittent inter-satellite contacts, along with heterogeneous onboard datasets, can lead to ineffective and/or biased model aggregation. To address these issues, it is crucial yet challenging to design an effective DSFL scheduling strategy that determines whether and when to pull models from contact satellites and perform local training for optimizing DSFL performance. In this paper, we propose a contact-based DSFL framework and formulate the DSFL scheduling problem to maximize the accuracy of trained models. As the problem cannot be solved directly, we transform it into a hierarchical Markov game by introducing options for decision agents deployed on individual satellites. Under a learning-to-learn paradigm, we develop a Multi-agent Dueling Double Deep Q Network (MA3DQN)-based intelligent DSFL scheduling strategy. The agents, trained in a distributed and alternating manner, adaptively make scheduling decisions based on instantaneous partial observations of the environment. Simulation results demonstrate the efficiency and adaptability of the MA3DQN-based strategy over three baselines.
- Research Article
- 10.1371/journal.pone.0343246
- Jan 1, 2026
- PloS one
- Pratyay Hasan + 4 more
Dengue fever in Bangladesh has escalated from sporadic outbreaks to a persistent, nationwide health crisis. Traditional epidemiological analyses often assume a constant transmission regime, potentially overlooking fundamental shifts driven by viral, environmental, or societal factors. This population-level ecological time-series observational study aimed to identify and characterize significant structural breaks in the time series of dengue admissions in Bangladesh to define distinct epidemiological phases. Monthly dengue hospital admission data (January 2008─October 2025) were obtained from the Institute of Epidemiology, Disease Control and Research (IEDCR) and Directorate General of Health Services (DGHS) public archives. Analyses included STL seasonal-trend decomposition, the Zivot-Andrews unit root test (primary break detection), multi-algorithm breakpoint detection (PELT, Binary Segmentation, Window-based), K-means clustering (optimal at 3 clusters, silhouette score 0.867), and Markov regime-switching models. Ten structural breaks were identified through a consensus ranking approach. The most prominent break occurred in May 2021 (consensus score = 3). The Markov regime-switching model delineated three distinct transmission regimes: 1) a Low Baseline Regime (2008─2023) with a mean of 47 monthly cases (95% CI: 36─57); 2) an Intermediate Regime (2008─2025) with a mean of 1,288 monthly cases (95% CI: 927─1,648); and 3) a Hyperendemic Regime (2019─2025) with a mean of 26,127 monthly cases (95% CI: 17,207─35,048), representing a 556-fold increase over the low baseline. Seasonality strength was moderate (0.335), but the peak-to-trough seasonal ratio approached 180, indicating pronounced annual epidemic cycles superimposed on the substantially elevated baseline. Bangladesh has experienced an established regime shift to sustained hyperendemic dengue transmission (persistent as of October 2025) necessitating a fundamental shift from outbreak-response to sustained, year-round control strategies. It is most likely influenced by viral, environmental, and societal factors including documented serotype redistribution. Public health strategies must transition from outbreak-response to sustained high-transmission management, including year-round vector control with pre-monsoon intensification.
- Research Article
- 10.1109/tpami.2025.3649521
- Jan 1, 2026
- IEEE transactions on pattern analysis and machine intelligence
- Ben Yang + 4 more
Multi-view spectral clustering (MVSC) has garnered growing interest across various real-world applications, owing to its flexibility in managing diverse data space structures. Nevertheless, the fusion of multiple $n\times n$n×n similarity matrices and the separate post-discretization process hinder the utilization of MVSC in large-scale tasks, where $n$n denotes the number of samples. Moreover, noise in different similarity matrices, along with the two-stage mismatch caused by the post-discretization, results in a reduction in clustering effectiveness. To overcome these challenges, we establish a novel fast multi-view discrete clustering (FMVDC) model via spectral embedding fusion, which integrates spectral embedding matrices ($n\times c$n×c, $c\ll n$c≪n) to directly obtain discrete sample categories, where $c$c indicates the number of clusters, bypassing the need for both similarity matrix fusion and post-discretization. To further enhance clustering efficiency, we employ an anchor-based spectral embedding strategy to decrease the computational complexity of spectral analysis from cubic to linear. Since gradient descent methods are incapable of discrete models, we propose a fast optimization strategy based on the coordinate descent method to solve the FMVDC model efficiently. Extensive studies demonstrate that FMVDC significantly improves clustering performance compared to existing state-of-the-art methods, particularly in large-scale clustering tasks.
- Research Article
- 10.1109/access.2026.3668967
- Jan 1, 2026
- IEEE Access
- Emilija Kizhevska + 2 more
Virtual reality (VR) has been described as the “ultimate empathy machine” due to its ability to immerse users in perspectives beyond their own, enhancing emotional engagement. In this study, 105 participants experienced 360° VR videos portraying actors expressing core emotions: happiness, sadness, anger, and anxiety. Empathy was assessed through self-report questionnaires, alongside other affective states including arousal, valence, and discomfort. Physiological and expressive responses were recorded using multimodal sensor data that captured facial muscle activity, heart rate, and motion dynamics. Extracted features reflecting central tendencies, variability, and distributional patterns were used to cluster participants into distinct groups, revealing inter-individual differences in emotional and empathic engagement. Cluster-specific predictive models, including Random Forest (RF) and deep neural networks (DNN), were then trained to predict state empathy and other affective states by leveraging unique patterns within each cluster, achieving 75 percent balanced accuracy for empathy prediction (RF) and even higher results for other affective states. This study demonstrates a systematic approach for quantifying empathy and affective processes in VR through multimodal sensor data. The methodology highlights how physiological and expressive signals capture meaningful differences in engagement, supporting real-time, personalized prediction. These findings provide a foundation for objective assessment of empathy and other affective states and contribute to the development of immersive VR applications in research, clinical, and educational contexts.
- Research Article
- 10.1109/tits.2026.3672993
- Jan 1, 2026
- IEEE Transactions on Intelligent Transportation Systems
- Yubing Zheng + 5 more
Vehicles at congested intersections usually move in the form of queues. The reduced inter-vehicle spacing within moving vehicle queues (VQs) creates a cascading collision risk potential, where any initial conflict may trigger chain-reaction impacts. However, traffic conflict risks were mainly assessed at the individual level in existing studies, while the analysis of VQ format was rare. To fill the gap, this paper proposes a framework for assessing and predicting conflict risks for VQs using high-resolution trajectory data. The proposed framework comprises three main steps. First, a ‘path band’ is created for each vehicle at the intersection, based on which a grid map-based method is proposed for the real-time clustering of VQs. Second, a multiagent-based approach is developed for estimating angle, rear-end, and side-swipe conflicts for each identified VQ. In this case, both straight-going and turning VQs are considered. Third, multi-dimensional feature sequences are extracted to capture VQ’s dynamic characteristics, while the developed traffic conflict analytics are used to label conflict risks. Using the feature sequences and risk labels as input and response, respectively, long short-term memory (LSTM) models are developed to predict conflict risks for VQs. The effectiveness of the proposed framework is demonstrated through tests on the open-access CitySim dataset. The test results indicate that the LSTM models using feature sequences as input can effectively predict conflict risk for VQs. It is more desirable to establish prediction models for turning and straight-going VQs separately.
- Research Article
- 10.1371/journal.pone.0344997
- Jan 1, 2026
- PloS one
- Farideh Motaghian + 3 more
Spiking Neural Networks (SNNs) offer a biologically plausible and energy-efficient alternative to traditional artificial neural networks (ANNs), yet their design remains constrained by limited architectural flexibility and slow training dynamics. In this work, we introduce a novel SNN framework that leverages modular graph-based topologies and explicit synaptic delays to significantly enhance both training efficiency and classification performance. Our architecture, TANet-Tiny, incorporates structured graph stages with up to 32 nodes and diverse community-driven connectivity patterns derived from KMeans clustering, Louvain modularity, and Watts-Strogatz small-world models. We integrate these topologies into a topology-aware search space and explore them via a Spatio-Temporal Topology Sampling (STTS) approach, enabling the discovery of high-performing networks without exhaustive search. Experimental results on MNIST, CIFAR-10, and CIFAR-100 demonstrate that our modular designs achieve state-of-the-art accuracy while requiring 6-10 × fewer training epochs, with top-1 accuracy reaching 99.57% on MNIST and over 92% on CIFAR-10, all with reduced parameter counts. We introduce an accuracy-per-epoch metric to quantify training efficiency and show that modularity, rather than network size, is the critical driver of performance. This work lays the groundwork for scalable, interpretable, and low-latency SNN architectures suitable for deployment in neuromorphic and edge computing environments.
- Research Article
6
- 10.1109/tmc.2025.3592885
- Jan 1, 2026
- IEEE Transactions on Mobile Computing
- Dongyu Wei + 3 more
In this paper, a secure and communication-efficient clustered federated learning (CFL) design is proposed. In our model, several base stations (BSs) with heterogeneous task-handling capabilities and multiple users with non-independent and identically distributed (non-IID) data jointly perform CFL training incorporating differential privacy (DP) techniques. Since each BS can process only a subset of the learning tasks and has limited wireless resource blocks (RBs) to allocate to users for federated learning (FL) model parameter transmission, it is necessary to jointly optimize RB allocation and user scheduling for CFL performance optimization. Meanwhile, our considered CFL method requires devices to use their limited data and FL model information to determine their task identities, which may introduce additional communication overhead. We formulate an optimization problem whose goal is to minimize the training loss of all learning tasks while considering device clustering, RB allocation, DP noise, and FL model transmission delay. To solve the problem, we propose a novel dynamic penalty function assisted value decomposed multi-agent reinforcement learning (DPVD-MARL) algorithm that enables distributed BSs to independently determine their connected users, RBs, and DP noise of the connected users but jointly minimize the training loss of all learning tasks across all BSs. Different from the existing MARL methods that assign a large penalty for infeasible actions, we propose a novel penalty assignment scheme that assigns penalty depending on the number of devices that cannot meet communication constraints (e.g., delay), which can guide the MARL scheme to quickly find valid actions, thus improving the convergence speed. Simulation results show that the DPVD-MARL can improve the convergence rate by up to 20% and the ultimate accumulated rewards by 15% compared to independent Q-learning.