Related Topics
Articles published on Model selection
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
29524 Search results
Sort by Recency
- New
- Research Article
- 10.1016/j.saa.2025.127408
- Apr 5, 2026
- Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy
- Slobodan Šašić + 4 more
Evaluating the Accuracy of Partial Least Square Regression for Distillation Fractions from ATR-IR Spectra of Crude Oils, with a Focus on Selected Crudes and Fractions.
- New
- Research Article
- 10.1016/j.saa.2025.127404
- Apr 1, 2026
- Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy
- Hairong Guo + 2 more
Rapid characterization of heavy metals in soil using a novel integrated strategy for near-infrared spectroscopy models.
- New
- Research Article
- 10.1016/j.jbi.2026.104994
- Apr 1, 2026
- Journal of biomedical informatics
- Animesh Kumar Paul + 1 more
Learn safe, robust dynamic treatment regimes (DTRs) from observational trajectories that exhibit treatment selection bias, using an offline reinforcement learning (RL) approach. We propose CQL-RB, which augments Conservative Q-Learning (CQL) with a representation-balancing penalty based on an integral probability metric (IPM) (instantiated as either a maximum mean discrepancy (MMD) or an energy-distance penalty). The penalty aligns latent patient representations across treatment groups to reduce action-conditioned distribution shift while preserving CQL's conservative policy estimation. We evaluate CQL-RB on two clinically realistic simulators: EpiCare (eight environments) and AhnChemo from DTR-Bench, both modeling longitudinal healthcare decisions with binary actions at each stage. To emulate selection bias, we implement clinician-like behavior policies that assign treatment as a function of patient covariates. Baselines include BOWL, ACWL, T-RL, RL-NN, and standard CQL. Outcomes are expected return and adverse-event counts from simulator rollouts; model selection uses weighted importance sampling off-policy evaluation on held-out data. Ablations vary both the IPM weight β and the choice of IPM metric. Across all eight EpiCare environments and the challenging AhnChemo task, CQL-RB with either MMD or energy-distance penalties consistently achieves higher returns than competing methods while yielding lower (or comparable) adverse-event rates. Removing the balancing term degrades both return and safety, confirming its contribution. Performance is robust for moderate penalty weights (e.g., β∈{1,10,100}), with degradation only at overly large values (e.g., β≥1000 for MMD or β=10000 for energy distance). Representation balancing materially strengthens conservative offline RL for DTR learning under treatment selection bias. By aligning patient representations without altering CQL's safety mechanics, CQL-RB delivers policies that are both effective (higher returns) and safer (fewer adverse events) in realistic healthcare simulations. These findings underscore the importance of addressing treatment selection bias when learning robust and safe dynamic treatment policies.
- New
- Research Article
- 10.1016/j.cam.2025.117104
- Apr 1, 2026
- Journal of Computational and Applied Mathematics
- Yilun Wang + 3 more
Robust selection and estimation for sparse multivariate functional nonparametric additive models via regularized Huber regression
- New
- Research Article
- 10.1016/j.nucengdes.2025.114741
- Apr 1, 2026
- Nuclear Engineering and Design
- K Sergeenko + 4 more
Selection of RANS turbulence model for calculating thermal hydraulics of fuel assemblies of LMC reactors at low Reynolds numbers
- New
- Research Article
- 10.1016/j.dsp.2026.105908
- Apr 1, 2026
- Digital Signal Processing
- Z.M Kurdoshev + 1 more
Model selection method based on the neural networks for signal processing
- New
- Research Article
- 10.1016/j.carbpol.2026.124917
- Apr 1, 2026
- Carbohydrate polymers
- Yanjing Liu + 3 more
Unlocking the potential of functional food polysaccharides: The zebrafish model as a revolutionary high-throughput screening platform.
- New
- Research Article
- 10.1016/j.colsurfb.2025.115407
- Apr 1, 2026
- Colloids and surfaces. B, Biointerfaces
- Jan Kobierski + 6 more
How water models influence the interfacial organization of oxysterol epimers: A comparative simulation study using TIP3P and OPC.
- New
- Research Article
- 10.1016/j.jtherbio.2026.104426
- Apr 1, 2026
- Journal of thermal biology
- Sachin Kansal + 4 more
Thermal signatures in breast cancer: Deciphering latent biomarkers through deep learning and explainable AI.
- New
- Research Article
- 10.22266/ijies2026.0331.55
- Mar 31, 2026
- International Journal of Intelligent Engineering and Systems
UCD-MP-IoTIDF: A Unified Cross-domain and Multi-protocol Intrusion Detection Framework with Calibrated Deep Learning and Multi-criteria Decision Making–based Model Selection
- Research Article
- 10.1093/evolut/qpag044
- Mar 14, 2026
- Evolution; international journal of organic evolution
- Daniel S Caetano + 1 more
Principal Component Analysis (PCA) is one of the most widely used approaches for multivariate datasets. Biologists use PCA to visualize data, identify patterns in large datasets, determine independent axes of variation, and reduce dimensionality for further statistical analyses. Phylogenetic PCA is an extension of regular PCA that seeks to identify the major axes of variation independent of the phylogeny. We extend these methods by estimating PCA parameters using an explicit probability modeling framework. We implement multiple models of trait evolution (Brownian motion, Ornstein-Uhlenbeck, Early Burst, and Pagel's λ) and use the Akaike Information Criterion (AIC) for model selection. We also introduce a probabilistic approach to select the number of principal components to retain from a PCA. We demonstrate the advantages of probabilistic PCA, such as incorporating the error, or noise, arising from dimensionality reduction, which is ignored in regular PCA. We use extensive simulations and an empirical dataset with 35 traits to show the method's performance. We implemented the new approach in the R package "do3PCA" available from the RCran repository.
- Research Article
- 10.1002/adhm.202504889
- Mar 11, 2026
- Advanced healthcare materials
- Anna Wolfram + 10 more
Cerebral organoids are complex, three-dimensional (3D) dynamic models that recapitulate key features of brain development and disease. These systems serve as bioengineerable platforms with diverse architectures and customizable properties, enabling advances in both basic and translational neuroscience. Despite rapid adoption across neurodevelopment, neurodegeneration, and neuro-oncology, the field remains fragmented, with substantial methodological variability and no standardized framework for model selection. A systematic review of 738 original studies published between 2014 and 2024, drawn from 3631 articles across PubMed, Semantic Scholar, and OpenAlex, reveals that human induced pluripotent stem cell-derived cerebral organoids and neurodevelopmental studies dominate the field. In contrast, applications in neurodegeneration, brain metastases, and non-human systems remain limited, narrowing the translational scope. To address the challenge of navigating this expanding literature, OrganoidMap is introduced-an open-access, interactive web platform for exploring, filtering, and comparing cerebral organoid models across disease areas, cell sources, and methodological features. OrganoidMap enables the identification of appropriate models for specific experimental goals and reveals underexplored research areas. This synthesis establishes a scalable foundation for enhancing transparency, reproducibility, and model selection in organoid-based research, setting a new benchmark for how neuroscience and biomaterials communities organize, share, and advance cerebral organoid science.
- Research Article
- 10.1007/s00362-026-01819-w
- Mar 11, 2026
- Statistical Papers
- Michael Balzer
Abstract In urban and regional studies, spatial autoregressive models are widely employed to capture spatial patterns and a dependence structure in data. While numerous variable selection techniques based on the likelihood principle with favorable theoretical properties have been proposed, the practical applicability is limited by the computational burden of repeatedly evaluating the logarithm of the Jacobian occurring in the quasi log-likelihood which scales poorly with the number of observations. In this article, variable selection techniques that leverage a closed-form estimator for the spatial autoregressive parameter are discussed. The closed-form estimator in combination with spatial-cross validation techniques for regularization parameter tuning enables fast and scalable variable selection utilizing the least absolute shrinkage and selection operator as well as $$L_2$$ L 2 -boosting while preserving theoretical properties and ensuring computational efficiency. Furthermore, a dimensionality reduction approach is proposed that enables feasibility in high-dimensional settings where classical quasi-maximum likelihood or two-stage least squares estimators may fail to yield unique solutions. Monte Carlo experiments confirm proper functionality of the proposed methodology while the potential application is illustrated by investigating the drivers of life expectancy in German districts.
- Research Article
- 10.1148/ryai.260070
- Mar 11, 2026
- Radiology. Artificial intelligence
- Ricardo A Gonzales + 5 more
The effective integration of artificial intelligence (AI) systems into clinical medicine depends on comprehensive and transparent performance evaluation; however, the lack of standardized and widely accepted metrics poses challenges for reproducibility and model adoption. A comprehensive, machine-interpretable framework is presented to formalize the nomenclature and descriptions of 207 graphical, matrix, and scalar metrics used to measure AI model performance. The metrics taxonomy, developed as part of the Radiology Ontology of AI Datasets, Models and Projects (ROADMAP), provides a logically structured representation that captures the semantics of AI evaluation metrics, supports reasoning over metric classes, and enables automated completeness checks for AI model reporting. For each metric, the taxonomy incorporates a definition and citations to authoritative reference sources; where applicable, the taxonomy also includes synonyms, abbreviations, alternate language forms, mathematical formulae, and numerical bounds. The taxonomy supports evaluation of models operating on structured data, medical images, audio signals, and/or unstructured text. Logical axioms link each metric to one or more of 18 AI model performance criteria, including classification, calibration, image segmentation, and text analysis. By harmonizing terminology and enabling structured queries, ROADMAP's taxonomy of AI performance metrics facilitates model comparison, bias detection, and selection of appropriate evaluation methods across diverse datasets and clinical tasks. © RSNA, 2026 See also accompanying Special Report on ROADMAP ontology.
- Research Article
- 10.1038/s41598-026-43868-z
- Mar 11, 2026
- Scientific reports
- Fuzhen Gu + 1 more
The iron and steel industry belongs to the most electricity-intensive branches of manufacturing, and power expenses often account for a large portion of overall production costs. For this reason, reliable short-term forecasts of plant-level electricity consumption are essential for optimizing production planning, avoiding excessive demand charges, and supporting low-carbon operation. This study compares three representative approaches for hourly power-load prediction in a steel enterprise: eXtreme Gradient Boosting (XGBoost), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM). Using one year of operating data from an integrated steel plant, we build a dataset of 8,760 hourly records and design a unified preprocessing procedure, including three-sigma outlier detection, time-indexed linear interpolation, and chronological division into training, validation, and test subsets. All three models are trained within a 24-hour sliding-window framework and assessed with RMSE, MAE, and MAPE. The results indicate that each model is capable of depicting the cyclical variation of the load, whereas BiLSTM provides the most accurate predictions, achieving the lowest errors in both absolute and relative terms. XGBoost demonstrates competitive performance and robust trend following. In contrast, the unidirectional LSTM exhibits larger relative errors, particularly during low-load periods. The findings underline the benefits of bidirectional recurrent structures for short-term electricity-consumption forecasting in steel plants and offer guidance for model selection in industrial energy-management practice.
- Research Article
- 10.3390/math14060946
- Mar 11, 2026
- Mathematics
- Liang Qiu + 4 more
Music popularity prediction is a fundamental problem in music information retrieval, with important implications for digital content dissemination and creative decision-making on streaming platforms. In this study, music popularity prediction is formulated as a supervised regression problem, and six widely-used tree ensemble models (Random Forest, XGBoost, CatBoost, LightGBM, Extra Trees, and Decision Tree) are systematically evaluated using large-scale Spotify data. Among these models, Random Forest achieves the best predictive performance on this dataset (RMSE = 6.79, MAE = 5.10, and R2 = 0.6658), followed by Extra Trees (R2 = 0.6378) and Decision Tree (R2 = 0.6328). Bayesian hyperparameter optimization based on a Tree-structured Parzen Estimator with an Expected Improvement acquisition function is conducted over 50 trials with 5-fold cross-validation to ensure robust model selection. Shapley value decomposition via SHAP analysis reveals that temporal recency dominates feature importance, far surpassing traditional musical attributes, while acoustic intensity (loudness) exhibits a U-shaped contribution pattern with optimal values at moderate intensity levels. Further SHAP dependence analysis uncovers non-linear relationships, indicating substantial popularity advantages for recent releases and optimal loudness levels around −5 to 0 dB. These findings suggest that streaming popularity is primarily governed by temporal exposure dynamics and production-related characteristics rather than intrinsic musical structure, offering both theoretical insights for music information retrieval research and suggestive empirical patterns that may inform future investigations into digital music ecosystems.
- Research Article
- 10.1097/prs.0000000000013005
- Mar 10, 2026
- Plastic and reconstructive surgery
- Adee J Heiman + 4 more
The burden of congenital limb and musculoskeletal (CLM) anomalies disproportionately affects low- and middle-income countries (LMICs). The objective of this cross-sectional study was to examine the relationship of the global burden of these conditions with other developmental indicators, including financial, geographic, technological, and healthcare workforce metrics. Prevalence and disability-adjusted life year (DALY) rates of pediatric CLM anomalies were extracted from the 2021 Global Burden of Disease database for each country. Multiple economic, geographic, and workforce variables were obtained from the World Bank, International Road Federation, International Telecommunications Union, and World Health Organization. Multivariable analysis with appropriate model selection was used to determine the association of these variables with log-transformed DALYs (logDALYs). After multiple imputation of missing variables, model selection identified three variables: 1) Risk of catastrophic expenditure for surgical care, 2) internet access, and 3) number of doctors for use in regression analysis in addition to prevalence. On multivariable regression, increased prevalence, lower rates of internet access, and fewer doctors were independently associated with logDALYs. Patients with CLM conditions require highly specialized, long-term access to surgical, medical, and rehabilitative services. Bolstering physician workforce, increasing services to patients in rural areas, and expanding internet infrastructure in LMICs will decrease barriers to care for patients with CLM conditions.
- Research Article
- 10.1108/hff-09-2025-0715
- Mar 10, 2026
- International Journal of Numerical Methods for Heat & Fluid Flow
- Álvaro Martínez-Sánchez + 1 more
Purpose Traditional modeling techniques for forecasting turbulence often rely on correlation-based criteria, which may select variables that correlate with the target without truly driving its dynamics. This limits model interpretability, generalization and efficiency. The purpose of this study is to overcome these limitations by introducing an observational causality-based approach to input selection that identifies the variables responsible for the future evolution of a target quantity while disregarding noncausal factors. Design/methodology/approach The authors’ approach is grounded in the synergistic-unique-redundant decomposition (SURD) of causality, which dissects the information that candidate inputs provide about a target variable into unique, redundant and synergistic causal components. These components are directly linked to the theoretical limits of predictive performance, quantified through the information-theoretic notion of irreducible error. To estimate these causal contributions in practice, the authors leverage neural mutual information estimators. The authors demonstrate the methodology by forecasting wall-shear stress using direct numerical simulation (DNS) data of turbulent channel flow. Findings The analysis reveals that variables with high unique or synergistic causal contributions enable compact forecasting models with strong predictive performance, whereas redundant variables can be excluded without compromising accuracy. Specifically, when predicting future wall-shear stress using two wall-parallel planes separated in the wall-normal direction, the streamwise velocity near the wall provides unique information about the target. In contrast, when both planes are located close to the wall, their information is largely redundant, and either can serve as input without degrading predictive accuracy. Finally, synergistic interactions emerge between different velocity components, which, when combined, enhance the prediction of future wall-shear stress beyond what each component achieves individually. Originality/value This work presents a causality-based approach for input selection in turbulence forecasting. The method quantifies the causal contributions of candidate variables to the prediction of a future quantity of interest and connects them to the fundamental limits of predictive accuracy achievable by any model. This enables more interpretable and compact models by reducing input dimensionality without sacrificing performance. Beyond turbulence, the approach provides a general-purpose tool for variable selection in scientific machine learning, flow control and data-driven modeling of complex systems.
- Research Article
- 10.2196/78519
- Mar 9, 2026
- JMIR Medical Informatics
- Shumin Ren + 10 more
BackgroundCancer risk prediction models are vital for precision prevention, enabling individualized assessment of cancer susceptibility based on genetic, clinical, environmental, and lifestyle factors. However, the practical use of these models is hindered by fragmented resources, heterogeneous reporting, and the absence of transparent, structured systems for systematic discovery and comparison.ObjectiveThis study aimed to develop a retrieval-augmented, knowledge-guided system that provides accurate recommendations for cancer risk prediction models.MethodsWe developed CanRisk-RAG, a recommendation platform underpinned by a precisely constructed knowledge base comprising more than 800 peer-reviewed cancer risk prediction models spanning diverse cancer types, modeling approaches, and predictive variables. The system integrates (1) large language model (LLM)–based semantic tag extraction, (2) embedding vectorization of structured metadata and abstracts, (3) a multifactor ranking algorithm combining semantic similarity with multiple quality indicators, and (4) LLM-generated literature summarization to support rapid user interpretation. Performance was evaluated across 4 types of representative queries. Eight domain experts independently assessed retrieval quality. CanRisk-RAG was benchmarked against PubMed, ChatGPT-4o, ScholarAI, and Gemini 1.5 Flash.ResultsOn the independent validation set, CanRisk-RAG consistently outperformed all 4 baseline applications, achieving the highest overall relevance (8.30 [SD 0.59]) and reliability (7.62 [SD 0.76]) scores on a 10-point scale (P<.05). It also demonstrated high authenticity, data completeness, and consistency. Baseline applications frequently returned incomplete, inconsistent, or fabricated results, especially for complex, multifactorial queries, whereas CanRisk-RAG delivered accurate and structured recommendations grounded in validated evidence.ConclusionsCanRisk-RAG presents a transparent, domain-specific, and semantically enriched framework for discovering cancer risk prediction models, addressing several limitations of existing keyword-based search tools and general-purpose LLMs. By integrating structured knowledge, multifactor ranking, and LLM-based reasoning, the system aims to improve the precision, reproducibility, and usability of model selection in cancer risk prediction. While our evaluation demonstrates encouraging performance compared with baseline systems, further validation in broader clinical contexts and real-world applications is warranted. The framework’s general design may also be adaptable to other clinical model domains, providing a potential foundation for advancing evidence-based model discovery in precision medicine.
- Research Article
- 10.1007/s41237-025-00287-0
- Mar 9, 2026
- Behaviormetrika
- Ae Kyong Jung + 1 more
Evaluating WAIC and PSIS-LOO for bayesian diagnostic classification model selection