Articles published on Unsupervised Algorithm
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
3130 Search results
Sort by Recency
- New
- Research Article
- 10.1007/s11135-025-02500-4
- Dec 4, 2025
- Quality & Quantity
- Patrick Parschan + 1 more
Abstract This article presents the first systematic review of unsupervised and semi-supervised computational text-based ideal point estimation (CT-IPE) algorithms, methods designed to infer latent political positions from textual data. These algorithms are widely used in political science, communication, computational social science, and computer science to estimate ideological preferences from parliamentary speeches, party manifestos, and social media. Over the past two decades, their development has closely followed broader NLP trends—beginning with word-frequency models and most recently turning to large language models (LLMs). While this trajectory has greatly expanded the methodological toolkit, it has also produced a fragmented field that lacks systematic comparison and clear guidance for applied use. To address this gap, we identified 25 CT-IPE algorithms through a systematic literature review and conducted a manual content analysis of their modeling assumptions and development contexts. To compare them meaningfully, we introduce a conceptual framework that distinguishes how algorithms generate, capture, and aggregate textual variance. On this basis, we identify four methodological families—word-frequency, topic modeling, word embedding, and LLM-based approaches—and critically assess their assumptions, interpretability, scalability, and limitations. Our review offers three contributions. First, it provides a structured synthesis of two decades of algorithm development, clarifying how diverse methods relate to one another. Second, it translates these insights into practical guidance for applied researchers, highlighting trade-offs in transparency, technical requirements, and validation strategies that shape algorithm choice. Third, it emphasizes that differences in estimation outcomes across algorithms are themselves informative, underscoring the need for systematic benchmarking.
- New
- Research Article
- 10.3390/bioengineering12121300
- Nov 26, 2025
- Bioengineering
- Vincenzo Levi + 8 more
Microelectrode recording (MER) is commonly used to validate preoperative targeting during subthalamic nucleus (STN) deep brain stimulation (DBS) surgery for Parkinson’s Disease (PD). Although machine learning (ML) has been used to improve STN localization using MER data, the impact of preprocessing steps on the accuracy of classifiers has received little attention. We evaluated 24 distinct preprocessing pipelines combining four artifact removal strategies, three outlier handling methods, and optional feature normalization. The effect of each data processing procedure’s component of interest was evaluated in function of the performance obtained using three ML models. Artifact rejection methods (i.e., unsupervised variance-based algorithm (COV) and background noise estimation (BCK)), combined with optimized outlier management (i.e., statistical outlier identification per hemisphere (ORH)) consistently improved classification performance. In contrast, applying hemisphere-specific feature normalization prior to classification led to performance degradation across all metrics. SHAP (SHapley Additive exPlanations) analysis, performed to determine feature importance across pipelines, revealed stable agreement with regard to influential features across diverse preprocessing configurations. In conclusion, optimal artifact rejection and outlier treatment are essential in preprocessing MER for STN identification in DBS, whereas preliminary feature normalization strategies may impair model performance. Overall, the best classification performance was obtained by applying the Random Forest model to the dataset treated using COV artifact rejection and ORH outlier management (accuracy = 0.945). SHAP-based interpretability offers valuable guidance for refining ML pipelines. These insights can inform robust protocol development for MER-guided DBS targeting.
- New
- Research Article
- 10.1093/bioinformatics/btaf585
- Nov 25, 2025
- Bioinformatics (Oxford, England)
- Nir Nitskansky + 2 more
Biomolecules undergo dynamic transitions among metastable states to carry out their biological functions. Markov State Models (MSMs) effectively capture these metastable states and transitions at a defined temporal scale. However, actual dynamics typically span multiple temporal scales, ranging from fast atomic vibrations to slower conformational changes and folding events. We introduce multiscale Markov State Models (mMSMs), which represent biomolecular dynamics across multiple temporal resolutions simultaneously via a hierarchy of MSMs, and mMSM-explore, an unsupervised algorithm for generating mMSMs through multiscale adaptive sampling with on-the-fly identification of temporally metastable states. We benchmark our method on a toy system with nested energy minima; on alanine dipeptide, first with and then without assuming prior knowledge of its two reaction coordinates; and finally, we map the folding pathways of a fast-folding 35-residue miniprotein across scales. We demonstrate efficient mapping of energy landscapes, correct representation of multiscale hierarchies and transition states, accurate inference of stationary probabilities and transition kinetics, and de novo identification of underlying slow, intermediate, and fast reaction coordinates. mMSMs reveal how dynamic processes at different scales contribute collectively to the functional mechanisms of biomolecular machines. Python code and instructions are available at https://github.com/ravehlab/mMSM. Supplementary data are available at Bioinformatics online.
- New
- Research Article
- 10.48175/ijarsct-29501
- Nov 17, 2025
- International Journal of Advanced Research in Science, Communication and Technology
- Ankita Jadhav + 3 more
Abstract: The agriculture industry has undergone a transformation thanks to the quick development of methods based on machine learning (ML), which allow for data-driven decision-making to increase crop output and sustainability. The latest advancements, techniques, and uses of predictive machine learning models in agricultural and fertilizer prediction are examined in this review paper. In order to predict the best crops based on soil characteristics, climatic conditions, and geographic features, the study looks at a variety of supervised and unsupervised algorithms, including Decision Trees, Random Trees, Support Vector Machines (SVM), neural network methods (ANN), and ensemble approaches. The study also highlights machine learning (ML)-based fertilizer suggestion engines that analyse crop needs, soil fertility, and environmental conditions to enhance nutrient management. The advantages and disadvantages of present techniques, such as problems with data quality, choosing features, and model interpretability, are shown through comparative examination of several models and datasets from the body of existing literature. The integration of machine learning (ML) with Internet of Things (IoT), remote sensing, and Geographic Information Systems (GIS) to accomplish precision agriculture is highlighted in the paper's conclusion, along with potential avenues for future study to create prediction systems for farmers that are sustainable, scalable, and explicable..
- Research Article
- 10.1148/rycan.250066
- Nov 1, 2025
- Radiology. Imaging cancer
- Yunfei Zhang + 6 more
Purpose To evaluate an MRI-based strategy for quantifying intra- and peritumoral heterogeneity (ITH and PTH) in hepatocellular carcinoma (HCC) and develop ITH- and PTH-based models for diagnosing microvascular invasion (MVI) and stratifying prognostic risk. Materials and Methods Patients with HCC (≤5 cm) were retrospectively included from three different institutions from March 2012 to September 2023 and divided into internal training, internal testing, and external testing cohorts. Tumor and peritumoral tissues in MR images were categorized into distinct habitats using unsupervised clustering algorithms. High-throughput radiomic features were extracted from each habitat. The degree of feature variation within each habitat was quantified to derive characteristics representing ITH and PTH. Engineered features were developed to train machine learning models for MVI diagnosis. Kaplan-Meier survival curves and Cox regression analysis were used for survival analysis. Results A total of 432 patients (mean age, 54.31 years ± 11.15 [SD]; 371 male) were included. The TH_DNN model, constructed using ITH- and PTH-based quantitative features combined with a deep neural network (DNN), demonstrated the best predictive performance for MVI across the three datasets (area under the receiver operating characteristic curve range = 0.82-0.99). The subgroup predicted as MVI positive with the TH_DNN model exhibited a poorer prognosis than the MVI-negative subgroup. In terms of overall survival and postoperative recurrence, the hazard ratios for MVI diagnosis were 2.79 (95% CI: 1.35, 5.75; P = .006) and 2.17 (95% CI: 1.38, 3.43; P < .001), respectively. Conclusion This study developed a strategy for quantifying ITH and PTH, which was valuable for noninvasive and accurate identification of MVI and prognostic risk in patients with HCC. Keywords: Liver, MRI, Oncology, Hepatocellular Carcinoma, Microvascular Invasion, Tumor habitat, Intratumoral Heterogeneity, Peritumoral Heterogeneity Supplemental material is available for this article. © The Author(s) 2025. Published by the Radiological Society of North America under a CC BY 4.0 license.
- Research Article
- 10.54097/74v0kt12
- Oct 31, 2025
- Journal of Computer Science and Artificial Intelligence
- Yutao Rao + 2 more
This project focuses on predicting used car prices using a combination of data preprocessing, unsupervised learning, and advanced supervised learning techniques [1, 2]. The goal was to develop an accurate model for price prediction by exploring patterns in vehicle features and leveraging robust machine learning methodologies. The dataset was meticulously cleaned and enhanced by imputing missing values using a random forest-based imputation method [3], splitting multi-dimensional features such as engine specifications, and categorizing variables like brand and transmission types. Principal Component Analysis (PCA) was employed to reduce dimensionality, retaining 95% of the dataset's variance, and unsupervised clustering algorithms, including K-Means, K-Modes, and hierarchical clustering, identified meaningful groupings that provided insights into vehicle segmentation. For supervised learning, we implemented and compared multiple models, including Elastic Net regression, Random Forest, Support Vector Machines, and XGBoost. The XGBoost model demonstrated superior performance with an R² of 0.87 and a MAPE of 21.07%, effectively capturing non-linear relationships [1, 2]. Key predictors, including mileage, horsepower, model year, and brand grouping, provided important prediction power into price drivers. In the open-ended question part, we estimated the original prices of cars as if they were brand new using Random Forest and XGBoost models, adjusting attributes such as mileage and model year to simulate new conditions. Since our predicted prices closely fit with official release prices, so we successfully used machine learning techniques to achieve accurate predictions for car prices.
- Research Article
- 10.1038/s41598-025-20445-4
- Oct 21, 2025
- Scientific Reports
- Amna Zahoor + 3 more
The rapid growth of cloud computing and the Internet of Things (IoT) has increased the exposure of IoT devices to cyber-attacks due to their resource limitations and lack of standardized security protocols. This paper presents a robust anomaly detection framework for IoT networks using two unsupervised machine learning models: Isolation Forest (IF) and One-Class Support Vector Machine (OCSVM). Leveraging the TON_IoT dataset, we conduct a comparative evaluation of IF, OCSVM, and a lightweight fusion approach called Combined Scoring Anomaly Detection (CSAD). Results show that OCSVM achieves superior precision, recall, and accuracy compared to both IF and CSAD. To ensure reliability, we apply Random Forest-based feature importance analysis, fivefold cross-validation and hyperparameter tuning. Model resilience is further examined under adversarial label-flip poisoning attacks and interpretability is enhanced through Local Interpretable Model-Agnostic Explanations (LIME). The findings demonstrate that lightweight unsupervised algorithms can provide effective, low-resource anomaly detection for modern IoT environments.
- Research Article
- 10.3390/s25206408
- Oct 17, 2025
- Sensors (Basel, Switzerland)
- Xiaojuan Wang + 1 more
A smart device unlocking scheme based on ultrasonic gesture recognition is proposed, allowing users to unlock their devices by customizing the unlock code through gesture movements. This method utilizes ultrasound to detect multiple consecutive gestures, identifying micro-features within these gestures for authentication. To enhance recognition accuracy, an unsupervised segmentation algorithm is employed to accurately segment the gesture feature region and extract the time-frequency domain data of the gestures. Additionally, two-stage data enhancement techniques are applied to generate user-specific data based on a small sample size. Finally, the user-specific model is deployed to mobile devices via transfer learning for on-device, real-time inference. Experimental validation on a commercial smartphone (Redmi K50) demonstrates that the entire authentication pipeline, from signal acquisition to decision, processes 8 types of gestures in a sequence in sequence in approximately 1.2 s, with the core model inference taking less than 50 milliseconds. This ensures that the raw biometric data (ultrasonic echoes) and the recognition results never leave the user’s device during authentication, thereby safeguarding privacy. It is important to note that while model training is performed offline on a server to leverage greater computational resources for personalization, the deployed system operates fully in real time on the edge device. Experimental results demonstrate that our system achieves accurate and robust identity verification, with an average five-fold cross-validation accuracy rate of up to , and it shows robustness across different environments.
- Research Article
- 10.1177/08953996251380012
- Oct 17, 2025
- Journal of X-ray science and technology
- Jintao Fu + 6 more
BackgroundNon-destructive testing (NDT) is crucial for the preservation and restoration of ancient wooden structures, with Computed Tomography (CT) increasingly utilized in this field. However, practical CT examinations of these structures-often characterized by complex configurations, large dimensions, and on-site constraints-frequently encounter difficulties in acquiring full-angle projection data. Consequently, images reconstructed under limited-angle conditions suffer from poor quality and severe artifacts, hindering accurate assessment of critical internal features such as mortise-tenon joints and incipient damage.ObjectiveThis study aims to develop a novel algorithm capable of achieving high-quality image reconstruction from incomplete, limited-angle projection data.MethodsWe propose CADRE (Contour-guided Alternating Direction Method of Multipliers-optimized Deep Radon Enhancement), an unsupervised deep learning reconstruction framework. CADRE innovatively integrates the ADMM optimization strategy, the learning paradigm of Deep Radon Prior (DRP) networks, and a geometric contour-guidance mechanism. This approach synergistically enhances reconstruction performance by iteratively optimizing network parameters and input images, without requiring large-scale paired training data, rendering it particularly suitable for cultural heritage applications.ResultsSystematic validation using both a digital dougong simulation model of the Yingxian Wooden Pagoda and a physical wooden dougong model from Foguang Temple demonstrates that, under typical 90° and 120° limited-angle conditions, the CADRE algorithm significantly outperforms traditional FBP, iterative reconstruction algorithms SART and ADMM-TV, and other representative unsupervised deep learning methods (Deep Image Prior, DIP; Residual Back-Projection with DIP, RBP-DIP; DRP). This superiority is evident in quantitative metrics such as PSNR and SSIM, as well as in visual quality, including artifact suppression and preservation of structural details. CADRE exhibits exceptional capability in accurately reproducing internal mortise-tenon configurations and fine features within ancient timber.ConclusionThe CADRE algorithm provides a robust and efficient solution for limited-angle CT image reconstruction of ancient wooden structures. It effectively overcomes the limitations of existing methods in handling incomplete data, significantly enhances the quality of reconstructed images and the characterization of internal fine structures, and offers strong technical support for the scientific understanding, condition assessment, and precise conservation of cultural heritage, thereby holding substantial academic value and promising application prospects.
- Research Article
- 10.3991/ijim.v19i20.56307
- Oct 17, 2025
- International Journal of Interactive Mobile Technologies (iJIM)
- Mërgim H Hoti + 3 more
Low-resource languages present unique challenges for natural language processing (NLP) due to limited annotated corpora, linguistic resources, and pre-trained models. This paper addresses the gap in clustering methodologies for such languages by evaluating the performance of three unsupervised algorithms—K-Means, DBSCAN, and HDBSCAN— on social media text data. Unlike prior studies focusing on high-resource languages, this study explores challenges in preprocessing, tokenization, and vectorization specific to lowresource settings. The results highlight the sensitivity of clustering performance to linguistic nuances and preprocessing approaches, with DBSCAN and HDBSCAN excelling in handling noisy and unstructured data. The findings provide actionable insights into algorithm selection and preprocessing strategies, showcasing the potential and limitations of traditional clustering methods in low-resource NLP. By shedding light on these challenges, this study paper contributes to the development of inclusive approaches for text analysis across underrepresented languages, advancing NLP applications globally.
- Research Article
- 10.2106/jbjs.24.01466
- Oct 15, 2025
- The Journal of bone and joint surgery. American volume
- Joshua J Woo + 8 more
There is no foundational classification that 3-dimensionally characterizes arthritic anatomy to preoperatively plan and postoperatively evaluate total knee arthroplasty (TKA). With the advent of computed tomography (CT) as a preoperative planning tool, the purpose of this study was to morphologically classify pre-TKA anatomy across coronal, axial, and sagittal planes to identify outlier phenotypes and establish a foundation for future philosophical, technical, and technological strategies. A cross-sectional analysis was conducted using 1,352 pre-TKA lower-extremity CT scans collected from a database at a single multicenter referral center. A validated deep learning and computer vision program acquired 27 lower-extremity measurements for each CT scan. An unsupervised spectral clustering algorithm morphometrically classified the cohort. The optimal number of clusters was determined through elbow-plot and eigen-gap analyses. Visualization was conducted through t-stochastic neighbor embedding, and each cluster was characterized. The analysis was repeated to assess how it was affected by severe deformity by removing impacted parameters and reassessing cluster separation. Spectral clustering revealed 4 distinct pre-TKA anatomic morphologies (18.5% Type 1, 39.6% Type 2, 7.5% Type 3, 34.5% Type 4). Types 1 and 3 embodied clear outliers. Key parameters distinguishing the 4 morphologies were hip rotation, medial posterior tibial slope, hip-knee-ankle angle, tibiofemoral angle, medial proximal tibial angle, and lateral distal femoral angle. After removing variables impacted by severe deformity, the secondary analysis again demonstrated 4 distinct clusters with the same distinguishing variables. CT-based phenotyping established a 3D classification of arthritic knee anatomy into 4 foundational morphologies, of which Types 1 and 3 represent outliers present in 26% of knees undergoing TKA. Unlike prior classifications emphasizing native coronal plane anatomy, 3D phenotyping of knees undergoing TKA enables recognition of outlier cases and a foundation for longitudinal evaluation in a morphologically diverse and growing surgical population. Longitudinal studies that control for implant selection, alignment technique, and applied technology are required to evaluate the impact of this classification in enabling rapid recovery and mitigating dissatisfaction after TKA. Prognostic Level II . See Instructions for Authors for a complete description of levels of evidence.
- Research Article
- 10.1016/j.media.2025.103653
- Oct 1, 2025
- Medical image analysis
- Runshi Zhang + 3 more
TCFNet: Bidirectional face-bone transformation via a Transformer-based coarse-to-fine point movement network.
- Research Article
- 10.1016/j.jare.2025.10.056
- Oct 1, 2025
- Journal of advanced research
- Zhe Yu + 2 more
scMapNet: Marker-based cell type annotation of scRNA-seq data via vision transfer learning with tabular-to-image transformations.
- Research Article
- 10.1016/j.rmed.2025.108278
- Oct 1, 2025
- Respiratory medicine
- Laura Villar-Aguilar + 6 more
Characterisation of patients with Alpha-1 antitrypsin deficiency using unsupervised machine learning tools.
- Research Article
- 10.1016/j.autcon.2025.106423
- Oct 1, 2025
- Automation in Construction
- Dong Chen + 4 more
Unsupervised dam crack image segmentation algorithm based on adversarial learning and image fusion
- Research Article
- 10.19852/j.cnki.jtcm.20250319.001
- Oct 1, 2025
- Journal of traditional Chinese medicine = Chung i tsa chih ying wen pan
- Fan Mengyue + 9 more
To research the subtyping and treatment of depression by leveraging studying on extensive Traditional Chinese Medicine (TCM) experiences through artificial intelligence (AI). We retrieved depression-related literature published from inception to April 2023 from databases. From these sources, we extracted symptoms, signs, and prescriptions associated with depression. By utilizing the tree number system in the medical subject headings (MeSH), we established a hierarchical relationship matrix for symptoms/signs, as well as depression sample fingerprints. Using an unsupervised clustering algorithm, we constructed a machine learning model for classifying depression patients. Furthermore, we conducted an analysis of medication rules for each depression cluster. We created a My Structured Query Language (MySQL) database containing datasets of depression-symptoms/signs and depression-herbs, through mining 3522 published clinical literatures on TCM diagnosis and treatment for depression. We established hierarchical relationships among symptoms/signs of depression patients. Our unsupervised clustering analysis revealed that depression patients could be classified into 9 subtypes, with each subtype corresponding to a specific treatment prescription. Notably, one of the depression subtypes was consistently treated by Qi-tonifying formulas and herbs. This finding was further supported by data from Qi-deficiency patients, as there was a high similarity in the top symptoms/signs shared between this subtype and Qi-deficiency diagnosed by TCM. This study identified the subtypes and TCM treatment of depression by using machine learning and text mining.
- Research Article
- 10.1088/1742-6596/3109/1/012082
- Oct 1, 2025
- Journal of Physics: Conference Series
- Qirong Tang + 4 more
Abstract With the rapid proliferation of in-orbit spacecraft and space debris, perceiving information from non-cooperative targets has become a critical issue for space safety. However, low-light images resulting from challenging illumination conditions hinder effective feature extraction from these targets. Traditional low-light enhancement methods struggle to preserve details in space environments, while noise amplification remains a drawback of deep learning-based approaches. This study addresses these limitations by developing the space non-cooperative target dataset and proposing the first deep learning network tailored for space image enhancement. A simulation system leveraging a 3D rendering engine employs multi-dimensional joint sampling and data augmentation to create a high-resolution, multi-class dataset accounting for diverse lighting conditions. The curve estimation driven Low-Light Enhancement and Denoising Network (LLEDNet) innovatively integrates multi-scale feature fusion and a dedicated denoising branch, optimizing curve estimation to preserve details and suppress noise. Experimental results demonstrate that LLEDNet outperforms state-of-the-art unsupervised low-light enhancement algorithms across multiple metrics, providing robust data support for spacecraft maintenance, debris monitoring, and related safety tasks.
- Research Article
- 10.1049/icp.2025.2877
- Oct 1, 2025
- IET Conference Proceedings
- Yueran Ma + 5 more
Unsupervised algorithm for classifying high BER IP packet based on protocol natural redundancy
- Research Article
- 10.35234/fumbd.1668498
- Sep 30, 2025
- Fırat Üniversitesi Mühendislik Bilimleri Dergisi
- Anıl Sezgin + 2 more
Reliable analysis of UAV telemetry data is critical for mission safety, especially as drones are increasingly deployed in complex and high-risk environments. These data streams often include anomalies arising from sensor faults, environmental disruptions, or cyber-physical attacks, making robust anomaly detection essential. This study introduces an unsupervised anomaly detection framework designed specifically for high-frequency UAV telemetry. It combines domain-driven feature engineering with an AutoML-based optimization pipeline that enables automated model selection and hyperparameter tuning. The framework integrates four unsupervised algorithms—Local Outlier Factor, Isolation Forest, One-Class SVM, and Elliptic Envelope—ensuring adaptability to the dynamic nature of UAV operations. Evaluated on a real-world dataset of 127,000 samples from 48 UAV missions, the system uses expert-labeled anomaly segments solely for validation to preserve the integrity of unsupervised learning. Among all methods, Local Outlier Factor yielded the best results with 0.920 accuracy, 0.880 precision, 0.850 recall, and 0.860 F1-score. Scalable and low-latency, the proposed solution is well-suited for real-time deployment. By bridging theoretical advances with operational needs, this work contributes to safer and more resilient aerial robotic systems.
- Research Article
- 10.1302/1358-992x.2025.8.044
- Sep 29, 2025
- Orthopaedic Proceedings
- Peter Schwarzenberg + 3 more
Fracture management remains a significant clinical challenge, with nonunion occurring in approximately 5% of fractures. Tibial shaft fractures are particularly concerning with approximately 25% of these fractures remaining unhealed after five months, with reoperation rates up to 12%. Prognostic fracture healing simulations offer the potential to predict healing outcomes, aiding in treatment selection and enabling early intervention for complications. However, these simulations have historically lacked robust ground truth validation. Two recent studies utilized in vivo sensors in ovine osteotomy models to validate the time course of healing in well-reduced fractures1 as well as healing outcomes (union, delayed union, nonunion) in clinically relevant fixation configurations.Data of 24 sheep were utilized from two previously completed tibial osteotomy experiments2,3. The osteotomies were stabilized with medial locking compression plating. One group used a custom fixator that allowed for controlled axial motion equipped with a displacement sensor on 3 mm gaps. The second group used the AO Fracture monitor which measured plate strain on gap sizes ranging from 0.6 – 30 mm. Specimen specific finite element (FE) models were constructed with computed tomography (CT) scans in Simpleware (v17, Synopsys). Prognostic fracture healing simulations were performed via iterative FE analyses in Abaqus (v2021, Dassault Systems) using Python (3.11) and MATLAB (R2023b, MathWorks).In group 1, a paired sample T-Test found that there was no difference in healing time between the in vivo ground truth measurement and the simulated in silico sensor prediction.In group 2, an unsupervised Affinity Propagation clustering algorithm was employed to determine any group differences between the healing trajectories. The predicted groups aligned exactly with the sensor-based classification of the specimen into union, delayed union, and nonunion.This work highlights the potential of predictive bone fracture healing simulations and was the first step in an in vivo sensor based validation. Techniques such as these could assist in nonunion risk detection at earlier stages, improving patient care and outcomes.