Articles published on Unsupervised Machine Learning
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
23493 Search results
Sort by Recency
- New
- Research Article
- 10.1038/s41598-026-38257-5
- Feb 7, 2026
- Scientific reports
- Alberico Grimaldi + 9 more
Single-station analysis of Campi Flegrei (Italy) seismic signals using multiscale entropy and unsupervised learning.
- New
- Research Article
- 10.2196/77830
- Feb 6, 2026
- JMIR medical informatics
- Po-Yu Huang + 5 more
General anesthesia comprises 3 essential components-hypnosis, analgesia, and immobility. Among these, maintaining an appropriate hypnotic state, or anesthetic depth, is crucial for patient safety. Excessively deep anesthesia may lead to hemodynamic instability and postoperative cognitive dysfunction, whereas inadequate anesthesia increases the risk of intraoperative awareness. Electroencephalography (EEG)-based monitoring has therefore become a cornerstone for evaluating anesthetic depth. However, processed electroencephalography (pEEG) indices remain vulnerable to various sources of interference, including electromyographic activity, interindividual variability, and anesthetic drug effects, which can yield inaccurate numerical outputs. With recent advances in machine learning, particularly unsupervised learning, data-driven methods that classify signals according to inherent patterns offer new possibilities for anesthetic depth analysis. This study aimed to establish a methodology for automatically identifying anesthesia depth using an unsupervised, machine learning-based clustering approach applied to pEEG data. Standard frontal EEG data from participants undergoing elective lumbar spine surgery were retrospectively analyzed, yielding more than 16,000 data points. The signals were filtered with a fourth-order Butterworth bandpass filter and transformed using the fast Fourier transform to estimate power spectral density. Normalized band power ratios for delta, high-theta, alpha, and beta frequencies were extracted as input features. Fuzzy C-Means (FCM) clustering (c=3, m=2) was applied to categorize anesthetic depth into slight, proper, and deep clusters. FCM clustering successfully identified 3 physiologically interpretable clusters consistent with EEG dynamics during progressive anesthesia. As anesthesia deepened, frontal alpha oscillations became more prominent within a delta-dominant background, while beta activity decreased with loss of consciousness. The fuzzy membership values quantified transitional states and captured the continuum of anesthetic depth. Visualization confirmed strong correspondence among cluster transitions, Patient State Index trends, and spectral density patterns. This study demonstrates the feasibility of using unsupervised machine learning to enhance anesthetic depth assessment. By applying FCM clustering to pEEG data, this approach improves the understanding of anesthesia depth and integrates effectively with existing monitoring modalities. The proposed FCM-based method complements current EEG indices and may assist anesthesia practitioners and even nonanesthesia professionals in assessing anesthetic depth to enhance patient safety.
- New
- Research Article
- 10.1021/acs.analchem.5c04084
- Feb 6, 2026
- Analytical chemistry
- Keyi Hu + 7 more
The critical binding domain (CBD) is typically identified via tedious affinity screening of truncated and mutated sequences of an aptamer. We report here a wet-dry-wet experiment strategy, which enables the isolation of rapamycin-binding aptamers and the identification of the CBD at the same time without the need for affinity testing of numerous rationally designed sequences. In the first wet-experimental module, the pre-enriched library was obtained via 15 rounds of Capture-SELEX. In the dry-experimental module, the two key binding architectures, a three-stem-three-loop (3S3L) motif and a 3S3L-A motif (containing an additional adenosine in the second loop of 3S3L), were identified by comprehensively analyzing the high-throughput sequencing data using K-mer assembly, unsupervised learning (RBM), and mFold structure simulation. The structure-confined mixed secondary library designed based on the two structures exhibited high affinity, validating the importance of structure. In the second wet-experimental module, enriched library 2R6 was obtained after six rounds of second Capture-SELEX. A series of aptamers with nanomolar dissociation constants and high specificity were obtained, along with the identification of 11-nt CBD via analyzing the high-throughput sequencing result of 2R6. The decreased or completely lost binding affinity of the mutated sequences of seed aptamer 1R15-2 confirmed the CDB. A strand-displacement fluorescence sensor was constructed and capable of the detection of rapamycin spiked in 10% human serum with a nanomolar limit of detection. This study provides an efficient method for simultaneous aptamer isolation and CBD identification and can be applied to other targets.
- New
- Research Article
- 10.3390/batteries12020055
- Feb 6, 2026
- Batteries
- Matthew Beatty + 2 more
Accurately assessing battery health across mixed datasets remains a challenge due to differences in chemistry, format, and usage history. This study presents a reproducible framework for preparing battery cycling data using incremental capacity analysis (ICA), with the aim of supporting machine learning (ML) workflows across both first-life and second-life battery datasets. The methodology includes IC curve generation, feature extraction, encoding and scaling, feature reduction, and unsupervised learning exploration. A two-tiered outlier detection system was introduced during preprocessing to flag edge-case samples. Two clustering algorithms, K-means and HDBSCAN, were applied to the engineered feature space to explore patterns in the IC feature space. K-means revealed broad health-related groupings with overlapping boundaries, while HDBSCAN identified finer clusters and flagged additional ambiguous samples as noise. To support interpretation, PCA and t-SNE were used to visualise the feature space in reduced dimensions. Rather than using clustering as a classification tool, the resulting cluster and noise labels are proposed as structure-aware meta-features for supervised learning. The framework accommodates heterogeneous battery datasets and addresses the challenges of integrating data from mixed sources with varying histories and characteristics. These outputs provide a structured foundation for future supervised classification of battery state of health.
- New
- Research Article
- 10.3390/electronics15030709
- Feb 6, 2026
- Electronics
- Huijuan Dong + 2 more
Domain Name System (DNS) tunneling, a stealthy attack that exploits DNS infrastructure, poses critical threats to dynamic networks and is evolving with emerging attack patterns. This study aims to accurately classify multi-pattern legitimate and malicious traffic and to identify previously unseen attack patterns. We focus on two core research questions: how to accurately classify known-pattern DNS queries and reliably identify unknown-pattern samples. The codified objective is to develop an unsupervised classification approach that integrates multi-pattern adaptation and the recognition of unknown patterns. We formalize the task as Emerging Pattern Classification and propose the Medium Neighbors Forest. It is a forest-based model that uses the “medium neighbor” mechanism and clustering to identify unknown patterns. Experiments verify that the proposed model effectively identifies unseen patterns, offering a new perspective for DNS tunneling detection.
- New
- Research Article
- 10.47672/ajce.2854
- Feb 5, 2026
- American Journal of Computing and Engineering
- Pankaj Verma + 1 more
Purpose: Oil, gas, and water transportation is important through pipeline systems which are susceptible to various anomalies such as structural degradation, malfunctions in operations, and leakages. Older physics-based and rule-based methods of monitoring, despite their interpretability, tend to have low sensitivity, flexibility, and scalability. However, the absence of labeled fault data, increasing operational complexity, and non-stationary pipeline conditions create a critical gap in reliable and scalable anomaly detection solutions for real-world deployment. This study addresses this gap by systematically analyzing data-driven unsupervised and semi-supervised learning approaches and their applicability to pipeline monitoring. Materials and Methods: New developments in unsupervised and semi-supervised learning have made data-driven anomaly detection schemas able to learn typical operational behavior and detect anomalies with little assistance of labeled fault data. This review gives a detailed summary of these methods within pipeline monitoring. Among the methods discussed are distance- and density-based, statistical and subspace methods, and neural network-based methods, including autoencoders and self-organizing maps. Semi-supervised algorithms such as one-class classification and hybrid statistical-learning are also discussed. The review includes the issues of data characteristics, practices of evaluation, interpretability, and real-time implementation. Findings: The study identifies and discusses a variety of unsupervised and semi-supervised learning techniques that can effectively address the challenges faced by traditional monitoring methods in pipeline systems. It highlights how these data-driven methods are able to detect anomalies by learning typical operational behavior with minimal reliance on labeled fault data. The study also covers important considerations like data characteristics, evaluation practices, and the challenges of implementing these methods in real-time environments. Unique Contribution to Theory, Practice, and Policy: This review provides a thorough evaluation of emerging data-driven anomaly detection methods, contributing to the theoretical understanding of how unsupervised and semi-supervised learning can be applied in pipeline monitoring. The study's practical contribution lies in its exploration of real-world applicability, offering insight into methods that can enhance the sensitivity and scalability of anomaly detection in pipeline systems. For policy, the research suggests future directions, including enhanced feature learning, concept drift adaptation, and integration with digital twins, which aim to improve the trustworthiness and efficiency of anomaly detection in pipeline operations.
- New
- Research Article
- 10.1186/s42400-026-00548-9
- Feb 5, 2026
- Cybersecurity
- Juan-Ignacio Iturbe-Araya + 1 more
Abstract Deploying unsupervised anomaly detection systems in heterogeneous smart home environments is hindered by the need for costly, per-site hyperparameter tuning. This paper addresses the critical challenge of hyperparameter transferability for creating zero-tune, plug-and-play security solutions. We systematically evaluate five unsupervised machine learning models [Elliptic Envelope (EE), Isolation Forest (IF), Local Outlier Factor (LoF), One-Class SVM (oSVM), and an Autoencoder (AE)] across five prominent IoT datasets. Using a rigorous dataset-specific hyperparameter tuning approach, we benchmark the performance of transferred configurations against both per-dataset optimization and default settings. Our findings establish a clear performance hierarchy: while dataset-specific tuning remains the gold standard, an intelligent transfer strategy significantly outperforms default configurations. Notably, we identify the IoTID20 dataset as the most effective source. Our quantitative topological analysis supports this, revealing that IoTID20’s high feature space complexity and cluster overlap (evidenced by low Silhouette scores) create a rigorous training environment that produces robust, portable hyperparameters. Furthermore, our analysis reveals a strategic trade-off: Autoencoders and LoF deliver the highest absolute performance, whereas IF offers the most substantial improvement over default settings. This work provides a quantitative framework for dataset-driven initialization, guiding the development of robust, low-maintenance intrusion detection systems.
- New
- Research Article
- 10.3390/ijms27031551
- Feb 4, 2026
- International Journal of Molecular Sciences
- Irena Šnajdar + 12 more
Morbid obesity is a complex, multifactorial disorder characterized by metabolic and inflammatory dysregulation. The aim of this study was to observe changes in obese patients adhering to a personalized nutrition plan based on multi-omic data. This study included 14 adult patients with a body mass index (BMI) > 40 kg/m2 who were consecutively recruited from those presenting to our outpatient clinic and who met the inclusion criteria. Clinical, biochemical, hormonal, and glycomic parameters were assessed, along with whole-genome sequencing (WGS) that included a focused analysis of obesity-associated genes and an extended analysis encompassing genes related to cardiometabolic disorders, hereditary cancer risk, and nutrigenetic profiles. Patients were stratified into nutrigenetic clusters using a patented unsupervised machine learning platform (German Patent Office, No. DE 20 2025 101 197 U1), which was employed to generate personalized nutrigenetic dietary recommendations for patients with morbid obesity to follow over a six-month period. At baseline, participants exhibited elevated glucose, insulin, homeostatic model assessment for insulin resistance (HOMA-IR), triglycerides, and C-reactive protein (CRP) levels, consistent with insulin resistance and chronic low-grade inflammation. The majority of participants harbored risk alleles within the fat mass and obesity-associated gene (FTO) and the interleukin-6 gene (IL-6), together with multiple additional significant variants identified across more than 40 genes implicated in metabolic regulation and nutritional status. Using an AI-driven clustering model, these genetic polymorphisms delineated a uniform cluster of patients with morbid obesity. The mean GlycanAge index (56 ± 12.45 years) substantially exceeded chronological age (32 ± 9.62 years), indicating accelerated biological aging. Following a six-month personalized nutrigenetic dietary intervention, significant reductions were observed in both BMI (from 52.09 ± 7.41 to 34.6 ± 9.06 kg/m2, p < 0.01) and GlycanAge index (from 56 ± 12.45 to 48 ± 14.83 years, p < 0.01). Morbid obesity is characterized by a pro-inflammatory and metabolically adverse molecular signature reflected in accelerated glycomic aging. Personalized nutrigenetic dietary interventions, derived from AI-driven analysis of whole-genome sequencing (WGS) data, effectively reduced both BMI and biological age markers, supporting integrative multi-omics and machine learning approaches as promising tools in precision-based obesity management.
- New
- Research Article
- 10.3390/rs18030501
- Feb 4, 2026
- Remote Sensing
- Matenia Karagiannidou + 3 more
Eutrophication is a form of pollution caused by elevated nutrient concentrations in water bodies, leading to excessive algal growth and subsequent oxygen depletion. This process poses significant risks to aquatic ecosystems and overall water quality. This study investigates the spatial distribution of eutrophication in the Almyros Stream, aiming to develop a rapid and high-resolution approach for identifying eutrophication patterns and selecting representative sampling sites. Almyros is an urban stream in the western Heraklion Basin (Crete, Greece) that is subjected to considerable pressures from agricultural, industrial, urban, and tourism-related activities. Data for this study were collected using a drone equipped with a multispectral sensor. The multispectral bands, together with remote sensing indices associated with chlorophyll presence, served as input data. Chlorophyll presence is a key indicator of phytoplankton biomass and is widely used as a proxy for nutrient enrichment and eutrophication intensity in aquatic ecosystems. The k-means clustering algorithm was then applied to classify the data and reveal the eutrophication spatial patterns of the study area. The results show that the methodology successfully identified spatial variations in eutrophication-related conditions and generated robust eutrophication pattern maps. These findings underscore the potential of integrating remote sensing and machine learning techniques for efficient monitoring and management of water bodies.
- New
- Research Article
- 10.1007/s11259-026-11087-6
- Feb 4, 2026
- Veterinary research communications
- Julieta María Decundo + 8 more
Weaning is a critical stage in swine production, characterized by intestinal alterations that affect piglet health and performance. In this study, machine learning techniques were applied to identify joint patterns between gut health and productivity during the first 15 days post-weaning. A total of 103 animals were analyzed using a dataset of 24 histomorphological, biochemical, and productive variables. Among the unsupervised clustering models, K-means (k = 2) achieved the best separation, revealing two groups with significant differences in intestinal parameters (villus height-to-crypt depth ratio, intestinal absorptive area, duodenal maltase activity, butyric, propionic and total volatile fatty acid concentrations) and performance outcomes (body weight at 15 days and average daily gain). Supervised models were subsequently applied as interpretative tools to assess variable relevance, with Random Forest achieving high internal consistency. SHAP analysis indicated that intestinal morphology, enzymatic activity, and microbial metabolites (particularly total volatile fatty acids, propionate, and butyrate) were most strongly associated with cluster classification. These findings highlight coordinated patterns between intestinal function and growth during the early post-weaning period and suggest that such biomarkers may represent potential targets to be explored in future nutritional strategies. Overall, this study demonstrates the potential of integrating unsupervised explainable machine learning approaches into animal science research for exploratory analysis and hypothesis generation.
- New
- Research Article
- 10.1016/j.jad.2025.120718
- Feb 1, 2026
- Journal of affective disorders
- Yu-Ru Su + 4 more
Revisiting drunk driving risk among individuals with alcohol use disorder using unsupervised learning: From clinical characteristics and neuropsychological performance to EEG data.
- New
- Research Article
- 10.1016/j.earscirev.2026.105420
- Feb 1, 2026
- Earth-Science Reviews
- X Rui + 1 more
Euler-pole clustering of GNSS velocities using unsupervised machine learning in the Southeastern Tibetan Plateau: Crustal block identification and the dominance of sinistral-slip faults
- New
- Research Article
- 10.1016/j.forsciint.2025.112669
- Feb 1, 2026
- Forensic science international
- Stanard M Pachong + 4 more
Unsupervised machine learning for the detection and interpretation of key features in drip patterns.
- New
- Research Article
- 10.1002/anie.202523905
- Feb 1, 2026
- Angewandte Chemie (International ed. in English)
- Nikita I Kolomoets + 6 more
The discovery of new chemical transformations is central to advancing modern chemistry, yet conventional approaches often require months or years of extensive experimental screening. Here, we present a machine-learning-assisted and expert-guided pipeline for reaction discovery applied to the search for atom-economic cycloaddition reactions. Candidate reactions were generated from publicly available quantum chemical data, filtered through unsupervised machine learning, and clustered to reduce redundancy. A digital co-expert then enabled rapid prioritization, after which human expertise provided final selection and experimental validation. This hybrid workflow is fully compatible with current laboratory infrastructure and addresses the most time-consuming stage of reaction discovery, accelerating the expert screening bottleneck by approximately 180-fold (from>1200 days to 7 days). Within ∼1 week, two novel cycloaddition reactions were identified and experimentally confirmed, yielding previously undescribed products. While fully autonomous robotic platforms represent a long-term vision, their high cost and limited availability restrict immediate application. In contrast, our approach demonstrates the practicality of human-AI collaboration for reaction discovery, combining computational screening, machine learning and expert knowledge to efficiently expand the accessible chemical space.
- New
- Research Article
- 10.1016/j.gaitpost.2025.110026
- Feb 1, 2026
- Gait & posture
- Hwa-Ik Yoo + 3 more
Subgrouping non-specific low back pain based on spinal marker trajectory data: An unsupervised machine learning approach.
- New
- Research Article
- 10.1016/j.ahj.2026.107368
- Feb 1, 2026
- American heart journal
- Weiqi Liao + 10 more
Study design for an emulated trial of a two arm, parallel, stratified, adaptive, RCT of CABG versus PCI in people requiring myocardial revascularisation at high risk (High-Risk REVASC).
- New
- Research Article
- 10.3390/info17020131
- Feb 1, 2026
- Information
- Gulshat Amirkhanova + 5 more
The integration of Advanced Metering Infrastructure (AMI) provides high-resolution electrical data, essential for enhancing industrial efficiency and monitoring equipment health. However, the utility of this data is frequently compromised by anomalies, underscoring the necessity for robust, automated detection methodologies. This study benchmarks three distinct categories of machine learning models: a statistical baseline (SARIMA), an unsupervised classifier (Isolation Forest), and a deep learning reconstruction model (LSTM-Autoencoder). The evaluation was conducted using a multivariate dataset acquired from bakery manufactory equipment, employing a synthetic anomaly injection framework with a 5% contamination rate. The results indicate significant challenges in accurately detecting anomalies within this dataset. The SARIMA model achieved the highest average F1-Score (0.256), slightly outperforming the Isolation Forest (0.233), while the LSTM-Autoencoder performed the poorest (0.110). Critically, all models exhibited extremely low precision (ranging from 0.074 to 0.204), indicating an unacceptably high rate of false positives. The findings suggest that standard configurations of these models struggle to differentiate between true anomalies and the inherent variability of industrial operations, highlighting the need for advanced optimization and feature engineering for practical deployment.
- New
- Research Article
- 10.12913/22998624/210743
- Feb 1, 2026
- Advances in Science and Technology Research Journal
- Paweł Karpiński + 1 more
Comparison of unsupervised machine learning segmentation algorithms in the analysis of unmanned aerial vehicle – based multispectral crop images
- New
- Research Article
- 10.1016/j.knosys.2025.115169
- Feb 1, 2026
- Knowledge-Based Systems
- Qiuli Wang + 9 more
Unsupervised stain-aware pixel-adversarial transfer learning for virtual immunohistochemical staining
- New
- Research Article
- 10.1088/1538-3873/ae3986
- Feb 1, 2026
- Publications of the Astronomical Society of the Pacific
- An-Chieh Hsu + 4 more
Abstract Fast Radio Bursts (FRBs) are millisecond-duration radio transients of extragalactic origin. Classifying repeating FRBs is essential for understanding their emission mechanisms, but remains challenging due to their short durations, high variability, and increasing data volume. Traditional methods often rely on subjective criteria and struggle with high-dimensional data. In this study, we apply an unsupervised machine learning framework—combining Uniform Manifold Approximation and Projection and Hierarchical Density-Based Spatial Clustering of Applications with Noise—to eight observed parameters from FRB 20220912A. Our analysis reveals three distinct clusters of bursts with varying spectral and fluence properties. Comparisons with clustering studies on other repeaters, show that some of our clusters share similar features, such as FRB 20201124A and FRB 121102, suggesting possible common emission mechanisms. We also provide qualitative interpretations for each cluster, highlighting the spectral diversity within a single source. Notably, one cluster shows broadband emission and high fluence, typically seen in non-repeating FRBs, raising the possibility that some non-repeaters may be misclassified repeaters due to observational limitations. Our results demonstrate the utility of machine learning in uncovering intrinsic diversity in FRB emission and provide a foundation for future classification studies.