Articles published on Benchmarking
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
7091 Search results
Sort by Recency
- New
- Research Article
- 10.5038/1936-4660.19.1.1490
- Jan 1, 2026
- Numeracy
- Robert Prince
In South Africa, university completion rates remain low, with only 23% of students finishing within the regulation time (three or four years). These completion rates continue to reflect racial inequalities, with 'White' students significantly more likely to complete degrees in professional fields such as engineering and commerce compared to their 'African' counterparts. To address these challenges, the South African higher education institutions, through their umbrella body, introduced the National Benchmark (NB) tests to assess students' academic literacy skills—including quantitative literacy, academic literacy, and mathematics—to identify those most at risk of struggling with the curriculum. The NB tests aim to evaluate students' readiness for higher education using scores and proficiency bands. This study examines the predictive validity of the NB Quantitative Literacy test scores and proficiency bands in science, technology, engineering and mathematics (STEM) programmes in the faculties of commerce, engineering and science. The analysis focuses on completion, dropout, and retention rates within one first-time entry cohort at a South African university. Among 2,493 first-time entering students, 15% dropped out after the first year, 22% left by the regulation time, and 23% exited within two additional years. Graduation rates were 34% within regulation time and 67% within two extra years. Findings highlight the predictive value of the NB Quantitative Literacy assessment, emphasizing its potential role in informing admission and placement decisions, curriculum design, and teaching and learning strategies to enhance student success in STEM programmes.
- New
- Research Article
- 10.1109/tnb.2025.3604284
- Jan 1, 2026
- IEEE transactions on nanobioscience
- Ankit Patil + 5 more
Three-electrode miniaturized interdigitated system (IDEs) for electrochemical measurements with enhanced sensitivity and performance was reported here. The system included a reference electrode, a counter electrode, and a working electrode, all configured as interconnected electrodes. Present work focused on optimizing the number of working electrodes and their geometric parameters to achieve peak performance, with bench marking system Potassium Ferricyanide. This optimization addressed the critical interplay between capacitance, resistance, sensitivity, and aspect ratio. Unlike previous configurations where the reference electrode was separated from the interdigitated design, the present approach integrates the reference electrode into the interdigitated configuration, greatly increasing sensitivity. Despite using a low-cost conductive material such as carbon PLA (polylactic acid) for 3D printed (3DP) electrodes, in a three-electrode interdigitated system, the current observed at the oxidation peak showed a significant increase of 97-98%, while the reduction peak exhibits an increase of 65-66% compared to the two-electrode interdigitated system. The screen-printed (SP) electrodes used for design validation exhibited minimal variation in cycles in a two working electrode interdigitated configuration. This progress highlighted the potential of interconnected electrodes in developing susceptible and efficient electrochemical sensors.
- New
- Research Article
- 10.1142/s1758825125501418
- Dec 31, 2025
- International Journal of Applied Mechanics
- Longgang Tian + 2 more
The determination of double-K parameters of concrete components is typically carried out through experimental and analytical methods within the framework of Linear Elastic Fracture Mechanics (LEFM). Testing large-scale concrete components with arbitrary geometry presents significant challenges. These challenges also affect the evaluation of double-K fracture parameters. To address this problem, this paper proposes an alternative numerical approach. The method consists of two main steps and is designed to determine the double-K parameters for large-scale concrete components. Based on known physical properties obtained through material property testing, the numerical results demonstrate good agreement with both analytical solutions and experimental data. In this approach, a solution-based crack propagation path is obtained using an XFEM-cohesive zone model (CZM) analysis. The critical crack opening displacement (w crit ) is employed to discretize the physical crack from the fracture process zone, while simultaneously filtering out the nodes participating crack face regeneration. The resulting physical crack model is then reinserted into the original model to perform a static XFEM crack analysis, yielding the fracture toughness curve and double-K parameters. Fracture energy and specimen thickness are identified as factors influencing w crit . Three-Point-Bending (TPB), Compact-Tension (CT), and semi-circular bending (SCB) specimens are employed to validate the accuracy of the proposed approach and investigate the characteristics of w crit . A tension-shear benchmark test specimen is employed to demonstrate the robustness and functionality of originally developed crack regeneration technique. The proposed method offers an effective approach to determine the double-K parameters and evaluate the stress intensity factors (SIFs) of arbitrarily shaped cracks. All source code is available upon request by contacting the corresponding author.
- New
- Research Article
- 10.37701/ts.10.2025.03
- Dec 30, 2025
- Випробування та сертифікація
- D Rybachok + 3 more
The ongoing full-scale war in Ukraine has exposed significant challenges in the logistical support of the Armed Forces, particularly regarding the organization of food services in field conditions. This paper addresses the critical necessity of replacing obsolete Soviet-era technical assets, such as the KP-125 and KP-130 trailer kitchens, with advanced modular systems like the MK-500 container kitchen. The authors argue that this modernization requires a fundamental revision of testing protocols to ensure interoperability with NATO forces. The study provides a detailed review of the national regulatory framework, specifically analyzing the requirements of DSTU V 15.210:2023 and DSTU V 15.211:2023, which govern the lifecycle management and testing programs of military equipment. A central component of the research is the systematization of evaluation criteria for operational reliability under combat conditions. The article proposes adopting the US Department of Defense standard MIL-STD-810H as a benchmark for durability testing. It describes specific test procedures in detail, including Method 514.8 for assessing vibration resistance during transportation over rough terrain, Method 516.8 for simulating mechanical shock and transit drops during loading operations, and climatic chamber tests (Methods 501.7 and 502.7) to verify system performance in extreme temperatures ranging from -20°C to +60°C. The authors also highlight the importance of testing for environmental resilience against rain (Method 506.6), humidity (Method 507.6), and dust (Method 510.5) to ensure the hermetic sealing of sensitive electronics and ventilation systems. Furthermore, the study highlights the paramount importance of biological safety and food hygiene in environments with damaged infrastructure. It substantiates the mandatory implementation of Hazard Analysis and Critical Control Points (HACCP) protocols in field settings. The paper validates specific critical control points, such as the rigorous monitoring of thermal processing temperatures (requiring internal product temperatures of at least 74°C) and the maintenance of safe storage regimes (keeping cold products below 5°C and hot dishes above 57°C) to prevent foodborne disease outbreaks. The analysis also encompasses ergonomic factors affecting crew efficiency, such as workspace zoning to prevent cross-contamination, and environmental controls including noise reduction, lighting, and microclimate maintenance (temperature 17–23°C, humidity 40–60%). Based on this multi-criteria analysis, the authors recommend the development of a unified, comprehensive "Program and Methodology" for testing food service equipment. This approach ensures that the Armed Forces of Ukraine receive equipment that is not only functional—capable of serving 500 personnel within tight deployment timeframes—but also safe, durable, and fully compatible with modern logistics strategies.
- New
- Research Article
- 10.64898/2025.12.01.25341004
- Dec 30, 2025
- medRxiv
- Xingmeng Zhao + 14 more
Background:For large language models (LLMs) to reach their potential as information technology tools that make medication use safer, clinically relevant benchmarks capable of automated grading and designed specifically to measure the performance of LLMs for medication tasks are required. The purpose of this study was to design a suite of benchmarking tests reflective of Comprehensive Medication Management (CMM; the standard of care for medication optimization) and quantify the baseline performance of the latest LLMs.Methods:We established six benchmarks representing critical stages of the CMM process: drug formulation matching, drug order (sig) generation, drug route matching, drug-drug interaction identification, renal dose identification, and drug-indication matching. For each benchmark, we curated a clinician-annotated dataset comprising 250 standardized input-output pairs including both inpatient and outpatient medications. We evaluated the clinical knowledge retrieval capabilities of three LLMs: GPT-4o-mini, MedGemma-27B, and LLaMA3–70B. We employed a zero-shot prompting strategy, excluding in-context examples, to assess the models’ internal clinical knowledge rather than their few-shot learning potential. To check reliability, each model was run three times using a temperature of 0.7 (a mid-range value of an LLM setting controlling text generation randomness). Performance was assessed using task-specific evaluation metrics including precision (positive predictive value), recall (sensitivity), F1-score, accuracy, and correctness consistency across trials.Results:Across six benchmarks, LLaMA3–70B demonstrated the highest performance in four tasks: drug-formulation matching (F1, 54.0% [95 CI: 50.1–58]), drug-order generation (accuracy, 88.0%), drug-route identification (F1, 74.3% [95 CI: 71–78]), and drug-indication identification (accuracy, 97.6% [95 CI: 95.6–99.2]). In the drug–drug interaction task, GPT-4o-mini achieved the highest overall accuracy (70.4% [95 CI: 64.8–75.7]). For renal dose–adjustment identification, GPT-4o-mini demonstrated the highest F1 score (83.3% [95 CI: 77.6–88]). Correctness-consistency scores ranged from 8.0% to 97.6% across benchmarks, with no model exhibiting uniformly superior consistency.Conclusions:Model performance varied substantially across medication-related tasks. LLaMA3–70B demonstrated promising baseline performance in tasks involving formulation, ordering, route, and indication. GPT-4o-mini showed potential advantages in drug–drug interaction detection and renal dose adjustment. These findings underscore the need for task-specific evaluation when deploying models for medication-focused clinical decision support.
- New
- Research Article
- 10.1038/s41746-025-02277-8
- Dec 26, 2025
- NPJ digital medicine
- Shirui Wang + 37 more
Large language models (LLMs) hold promise in clinical decision support but face major challenges in safety evaluation and effectiveness validation. We developed the Clinical Safety-Effectiveness Dual-Track Benchmark (CSEDB), a multidimensional framework built on clinical expert consensus, encompassing 30 metrics covering critical areas like critical illness recognition, guideline adherence, and medication safety, with weighted consequence measures. Thirty-two specialist physicians developed and revised 2069 open-ended Q&A items aligned with these criteria, spanning 26 clinical departments to simulate real-world scenarios. Benchmark testing of six LLMs revealed moderate overall performance (average total score 57.2%, safety 54.7%, effectiveness 62.3%), with a significant 13.3% performance drop in high-risk scenarios (p < 0.0001). Domain-specific medical LLMs showed consistent performance advantages over general-purpose models, with relatively higher top scores in safety (0.912) and effectiveness (0.861). The findings of this study not only provide a standardized metric for evaluating the clinical application of medical LLMs, facilitating comparative analyses, risk exposure identification, and improvement directions across different scenarios, but also hold the potential to promote safer and more effective deployment of large language models in healthcare environments.
- Research Article
- 10.1680/jphmg.25.00046
- Dec 18, 2025
- International Journal of Physical Modelling in Geotechnics
- Rasmus Tofte Klinkvort + 20 more
The large-diameter monopile is a commonly used foundation concept for offshore wind turbines. The advantages of geometrical simplicity and reliable performance make it often the most attractive solution. Despite the concept’s high popularity, optimisation of the current design models can still be made. To address fundamental understanding of modelling effects in centrifuge testing of laterally loaded monopiles in sand, a large coordinated centrifuge-testing programme across nine different centrifuge centres worldwide has been conducted. This paper presents firstly the results of a local benchmark modelling of model test series performed in two centrifuges and secondly the results of global benchmark testing across the nine centrifuges. The results highlight the reliability of centrifuge testing as it was possible to model a similar prototype response in both the local and global benchmark tests, despite differences in the experimental setups and pile geometries. Furthermore, as examples of the modelling technique, two different cases are presented, one showing the effect of installation and one showing the effect of pile penetration depth. Finally, recommendations are provided to enhance centrifuge testing of monopile response under complex loading.
- Research Article
- 10.3390/s25247684
- Dec 18, 2025
- Sensors (Basel, Switzerland)
- Yongbin Mu + 4 more
Scene text recognition has significant application value in autonomous driving, smart retail, and assistive devices. However, due to challenges such as multi-scale variations, distortions, and complex backgrounds, existing methods such as CRNN, ViT, and PARSeq, while showing good performance, still have room for improvement in feature extraction and semantic modeling capabilities. To address these issues, this paper proposes a novel scene text recognition model named the Encoder–Decoder Interactive Model (EDIM). Based on an encoder–decoder framework, EDIM introduces a Multi-scale Dilated Fusion Attention (MSFA) module in the encoder to enhance multi-scale feature representation. In the decoder, a Sequential Encoder–Decoder Context Fusion (SeqEDCF) mechanism is designed to enable efficient semantic interaction between the encoder and decoder. The effectiveness of the proposed method is validated on six regular and irregular benchmark test sets, as well as various subsets of the Union14M-L dataset. Experimental results demonstrate that EDIM outperforms state-of-the-art (SOTA) methods across multiple metrics, achieving significant performance gains, especially in recognizing irregular and distorted text.
- Research Article
- 10.1093/bioinformatics/btaf664
- Dec 16, 2025
- Bioinformatics (Oxford, England)
- Hu Chen + 5 more
The analysis of RNA isoforms using long-read single-cell RNA sequencing (scRNA-seq) represents a frontier in gene expression research, offering deeper insights beyond traditional gene-level analysis. However, specialized analytical methods tailored for this advanced technology remain scarce, underscoring the urgent need for novel tools to match the rapid pace of its development. Here, we present IsoDiffR, a robust tool designed to identify RNA isoforms with expression patterns that differ from their corresponding genes or major isoforms across cell types, enabling both pairwise and multi-cell-type comparisons. Using IsoDiffR, we conducted benchmark tests using simulated data and analyzed long-read scRNA-seq data from the corneal limbus of Macaca fascicularis and the human frontal cortex, uncovering previously unrecognized cell-type-specific isoforms that were not detectable using conventional approaches. Additionally, we explored the structural and functional properties of these isoforms, from their nucleotide sequences to their corresponding protein isoforms, revealing their potential biological roles. Our findings offer new perspectives on gene expression regulation at the single-cell level and provide a methodological framework for future investigations of isoform-specific functions in diverse biological contexts. The R package is available on https://github.com/Eveqian98/IsoDiffR. Supplementary data are available at Bioinformatics online.
- Research Article
- 10.26689/jera.v9i6.13160
- Dec 16, 2025
- Journal of Electronic Research and Application
- Guanlin Pan + 5 more
In panoramic images, the geometric distortion caused by wide-angle lenses makes traditional semantic segmentation methods difficult to accurately segment the glass areas. To address the challenges of capturing spatial features and integrating context information, we propose the Panoramic Glass Image Segmentation Network (PGISNet). This network integrates the Matrix Decomposition Base Module (MDBM), the Transparent Perception Consistency Module (TACM), the Context and Texture Compensation Module (CTCM), and the Multi-scale Gated Context Attention Module (MGCA), constructing a progressive feature processing flow. Experimental results on the PanoGlassV2 benchmark test show that PGISNet achieved 90.03% IoU, 94.76% F-score, and 94.0% PA, significantly outperforming existing methods, verifying its effectiveness and advancement in the panoramic image glass segmentation task.
- Research Article
- 10.54361/ajmas.2584132
- Dec 15, 2025
- AlQalam Journal of Medical and Applied Sciences
- Naima Shamsi
This study investigates how truncation (discretization) error and floating-point round-off jointly influence the practical accuracy of three classical composite quadrature rules: the composite rectangle rule, the composite trapezoidal rule, and the composite Simpson rule. After summarizing the standard theoretical error orders for sufficiently smooth integrands, the methods are implemented in MATLAB using explicit IEEE-754 single-precision (32-bit) and double-precision (64-bit) arithmetic to isolate precision effects. Numerical experiments are performed on benchmark test integrals with known analytical values for four representative integrands, namely sin(x), cos(x),e^x and log(x), and absolute errors are reported for each method. Across the tested cases, double precision substantially suppresses round-off effects and enables markedly smaller errors, with several results approaching machine-level magnitudes. Simpson’s rule generally provides the highest accuracy in double precision, attaining absolute errors on the order of 10⁻¹³–10⁻¹⁶ for multiple test functions, while single precision typically yields errors in the 10⁻⁸–10⁻⁶ range depending on the integrand and method. The exponential function exhibits larger errors in single precision, consistent with its rapid growth and increased sensitivity to accumulated rounding during composite summation. Overall, the results demonstrate that achieving reliable high-accuracy numerical integration requires not only an appropriate quadrature order but also sufficient arithmetic precision, and that empirical evaluation across multiple test functions remains essential when floating-point effects are non-negligible.
- Research Article
- 10.63313/jcsft.9032
- Dec 15, 2025
- Journal of Computer Science and Frontier Technologies
- Lingduo Zhang
Weakly-supervised temporal action localization task is to identify action cate-gories and start and end times in unedited videos. How to achieve feature cali-bration between different modalities in this task, and how to further optimize action boundaries based on the similarity of action common sequences remains an urgent problem to be solved. Based on the above issues, we propose a novel network framework, weakly supervised temporal action localization via feature calibration-assisted sequence comparison (FCSC). The core of the FCSC frame-work lies in the Multi-Modal Feature Calibration Module (MFCM), which utilizes global and local contextual information from the primary and auxiliary modali-ties to enhance RGB and FLOW features, respectively, achieving deep feature calibration. In addition, the framework introduces an improved distinguishable edit distance metric to sequence similarity optimize (SSO) and maximum con-sistent subsequence (MCS) to narrow the gap between classification and locali-zation tasks. After multiple experiments, it has been proven that FCSC achieved maps of 47.7% and 27.9%, respectively on the THUMOS14 and ActiveNet1.2 temporal action recognition benchmark test sets, fully verifying the effective-ness of the model.
- Research Article
- 10.3390/app152413149
- Dec 15, 2025
- Applied Sciences
- Jingfang Shen + 4 more
The adaptive Kriging method is widely used in the engineering design for complex black-box problems, yet its accuracy is limited by imbalanced exploitation–exploration. This paper proposes a KNN-based maximization of the weighted expected prediction error (KMWEPE) method to address this challenge. For each iteration, the most sensitive region is identified by the leave-one-out cross-validation error (LOOCVE) and the distance between sample points. Two different sets of candidate points are generated, respectively, in the most sensitive region and the design space, in order to dynamically balance the local exploitation and global exploration. Then, the bias–variance decomposition method is used to convert the expected prediction error of each candidate point into the sum of the bias and the Kriging prediction variance. And the bias is replaced by the weighted sum of the LOOCVE of the K-nearest neighbors sample points based on KNN. Furthermore, the arithmetic sum of the bias and the Kriging prediction variance above is used to construct a new function. Finally, the candidate with the maximum weighted expected prediction error is selected as the new sample point for the next iteration. Six benchmark test functions, two publicly available datasets, and two engineering examples are tested to demonstrate the effectiveness of the proposed KMWEPE method in improving the model accuracy. The test results show that compared to the LHD and MEPE methods, the RMSE mean and standard deviation of the KMWEPE method decreased by an average of 31.6% and 28.8%, respectively.
- Research Article
- 10.1021/acs.jctc.5c01784
- Dec 12, 2025
- Journal of chemical theory and computation
- Peipei Zhang + 2 more
Transition-state localization is critical for elucidating chemical reaction mechanisms but remains one of the most computationally demanding challenges in theoretical chemistry. Here, we introduce a novel method, reaction directional analysis-dimer (RDA-D), which integrates reaction directional analysis (RDA) with the dimer method to achieve efficient and reliable transition state searching. Reaction directional analysis generates high-quality quasi-transition-state structures directly from only the initial and final state geometries, combining dynamic interpolation, structural optimization, and directional analysis. These quasi-transition-state structures then serve as starting points for refinement via the dimer method. Benchmark tests on a diverse set of gas-phase and catalytic reactions on surfaces demonstrate that RDA-D is, on average, 5.83 times faster than the Nudged Elastic Band (NEB) method in CPU time and reduces the number of gradient evaluations by a factor of 4.74. Moreover, reaction directional analysis eliminates the need for predefined reaction coordinates or chemically intuitive initial guesses, providing a robust, scalable, and automation-friendly framework for transition-state localization.
- Research Article
- 10.1519/jsc.0000000000005310
- Dec 12, 2025
- Journal of strength and conditioning research
- Zachary J Mcclean + 5 more
McClean, ZJ, da Silva Torres, R, Herzog, W, Pasanen, K, Lun, V, and Jordan, MJ. The influence of sport representation and attitudes toward strength training on neuromuscular performance profiles in university athletes: Part I-Male athletes. J Strength Cond Res XX(X): 000-000, 2025-Strength, power, and plyometric testing are essential to evaluate neuromuscular performance in athletes. However, this approach creates datasets with numerous outcome measures that can lead to challenges for interpretation and establishing relevant performance benchmarks for preseason testing and performance-readiness after injury. The idea that athlete performance profiles exist within a larger population has been suggested, but limited research has explored this concept or suggested methodologies for delineating relevant profiles. Exploring the existence of neuromuscular performance profiles in university athletes while accounting for the influence of the sport environment and psychosocial factors, such as attitudes toward strength training, may support more athlete-specific neuromuscular benchmarks. Healthy male university athletes (n = 272) from 5 sports completed a comprehensive neuromuscular performance testing battery and a questionnaire that included assessment of attitudes toward strength training. Unsupervised machine learning applied to the body weight-normalized neuromuscular performance dataset, along with Fisher's exact tests, was used to examine differences in attitudes toward strength training across clusters (alpha = 0.05). Five profiles were identified, including a high strength/high power/braking-dominant jump strategy cluster with a large ice hockey representation and a high strength/high power/fast jump strategy cluster consisting mostly of field-sport athletes. Differences in attitudes toward training were noted across profiles (p < 0.05); for instance, athletes in a low-strength/low-power profile tended to prefer training in a more private training environment (p = 0.023). These results may help inform neuromuscular performance benchmarks in male university athletes, while the psychosocial characteristics of these profiles may provide insight into increasing strength training engagement in this population.
- Research Article
- 10.1088/1361-6382/ae255e
- Dec 11, 2025
- Classical and Quantum Gravity
- Carlos Palenzuela + 14 more
Abstract We present MHDuet, an open source evolution code for general relativistic magnetohydrodynamics with neutrino transport. The code solves the full set of Einstein equations coupled to a relativistic, magnetized fluid with an M1 neutrino radiation scheme using advanced techniques, including adaptive mesh and large eddy simulation techniques, to achieve high accuracy. The Simflowny platform generates the code from a high-level specification of the computational system, producing code that runs with either the SAMRAI or AMReX infrastructure. The choice of AMReX enables compilation and execution on GPUs, running an order of magnitude faster than on CPUs at the node level. We validate the code against benchmark tests, reproducing previous results obtained with the SAMRAI infrastructure, and demonstrate its capabilities with simulations of neutron stars employing realistic tabulated equations of state. Resolution studies clearly demonstrate convergence faster than second order in the grid spacing. Scaling tests reveal excellent strong and weak scaling performance when running on GPUs. The goal of the code is to provide a powerful tool for studying the dynamics of compact objects within multi-messenger astrophysics.
- Research Article
- 10.3390/electronics14244880
- Dec 11, 2025
- Electronics
- Zhengying Cai + 3 more
Recharging and battery swapping are of great significance for extending the driving range of autonomous vehicles (AVs). However, if an AV cannot recharge or swap batteries in a timely manner, the consequences are more serious than for a traditional human-driven vehicle, as there is a lack of human assistance in an AV. To address this challenge, this study proposes the joint routing optimization of AVs under recharging and battery-swapping modes. Firstly, a multi-objective model is defined for the joint routing optimization problem of AVs, which minimizes the total distance, idling time, and charging waiting time of AVs while meeting all user demands. The user demand is described as a directed arc consisting of a departure node and a destination at random locations and times, and the AVs need to plan their routes to sequentially access all user demand arcs and recharge or swap batteries in a timely manner. Secondly, an improved artificial plant community (APC) algorithm is proposed to solve the NP-hard problem, including a recharging scheme and a hybrid scheme comprising recharging and swapping. In the seeding operation, random seeds are generated to enhance global search capabilities, and optimal solution learning is added in the fruiting operation to improve local search capabilities. In the growing operation, population optimization is strengthened to improve convergence performance. Thirdly, a benchmark test set was developed based on a real scenario in Wuhan, China. Compared to some baseline algorithms, the results show that the proposed APC algorithm exhibits better performance in solving the NP-hard problem.
- Research Article
- 10.3390/sym17122120
- Dec 9, 2025
- Symmetry
- Yijie Wang + 3 more
To address the inherent limitations of the standard Sine Cosine Algorithm (SCA) in multi-threshold image segmentation, this paper proposes an enhanced algorithm named the Reinforcement Learning–Thermal Conduction–Sine Cosine Algorithm (RLTC-SCA), with symmetry as a core guiding principle. Symmetry, a fundamental property in nature and image processing, refers to the invariance or regularity of grayscale distributions, texture patterns, and structural features across image regions; this characteristic is widely exploited to improve segmentation accuracy by leveraging consistent spatial or intensity relationships. In multi-threshold segmentation, symmetry manifests in the balanced distribution of optimal thresholds within the grayscale space, as well as the symmetric response of segmentation metrics (e.g., PSNR, SSIM) to threshold adjustments. To evaluate the optimization performance of RLTC-SCA, comprehensive numerical experiments were conducted on the CEC2020 and CEC2022 benchmark test suites. The proposed algorithm was compared with several mainstream metaheuristic algorithms. An ablation study was designed to analyze the individual contribution and synergistic effects of the three enhancement strategies. The convergence behavior was characterized using indicators such as average fitness value, search trajectory, and convergence curve. Moreover, statistical stability was assessed using the Wilcoxon rank-sum test (at a significance level of p = 0.05) and the Friedman test. Experimental results demonstrate that RLTC-SCA outperforms all comparison algorithms in terms of average fitness, convergence speed, and stability, ranking first on both benchmark test suites. Furthermore, RLTC-SCA was applied to multi-threshold image segmentation tasks, where the Otsu method was adopted as the objective function. Segmentation performance was evaluated on multiple benchmark images under four threshold levels (2, 4, 6, and 8) using PSNR, FSIM, and SSIM as evaluation metrics. The results indicate that RLTC-SCA can efficiently obtain optimal segmentation thresholds, with PSNR, FSIM, and SSIM values consistently higher than those of competing algorithms—demonstrating superior segmentation accuracy and robustness. This study provides a reliable solution for improving the efficiency and precision of multi-threshold image segmentation and enriches the application of intelligent optimization algorithms in the field of image processing.
- Research Article
- 10.1080/14763141.2025.2593325
- Dec 8, 2025
- Sports Biomechanics
- Roné Thompson + 3 more
ABSTRACT Benchmark tests in competitive cycling identify talent, individualise training, and monitor performance. However, varying protocols often produce conflicting results, reducing comparability. Isometric tests are prevalent, but reliability and performance correlation are underexplored. Determine the test–retest reliability of benchmark test metrics in elite track sprint cyclists and their relationship to a performance outcome. Nineteen elite track sprint cyclists (12 males, 7 females) completed seven benchmark tests across two days: modified sit-and-reach; on-bike rolling seated maximum 6-s sprints; 3-s bilateral on-bike isometrics at 90° crank angle; 3-s prone bench pull isometrics; 3-s lumbar extension isometrics; 3-s seated off-bike isometrics; and modified plank endurance. For the performance outcome, a third session within 7 days assessed peak power using an inertial load cycle ergometer. All tests showed excellent measurement consistency (ICC3,1 ≥ 0.92), with low systematic bias (p ≥ 0.063), though confidence interval varied due to modest sample size. High test–retest reliability was supported by low typical errors (CV 2.0–5.5%; 9.6% for endurance). Nine benchmark metrics, including bilateral isometric measures, showed moderate to excellent correlation with peak power output (r = 0.52–0.94, p ≤ 0.023); six remained statistically significant after Bonferroni correction (p ≤ 0.005). All benchmark metrics were reliable, with six strongly and statistically significantly associated with performance.
- Research Article
- 10.48084/etasr.13230
- Dec 8, 2025
- Engineering, Technology & Applied Science Research
- Viet Hung Tran
This study proposes an improved version of the Starfish Optimization Algorithm (SFOA) by integrating strategies from the Grey Wolf Optimizer (GWO) algorithm to address the entrapment in local minima and enhance its exploitation capabilities. Through benchmark tests on two asymmetrical steel frame structures, the proposed Improved SFOA (ISFOA) demonstrated superior performance compared to the original SFOA, Particle Swarm Optimization (PSO), GWO, and Stellar Oscillation Optimizer (SOO). The algorithm successfully optimized the benchmark steel frames, achieving the lightest structural designs among the tested algorithms. Specifically, for the four-story structure with a 132-member steel space frame, ISFOA obtained lighter designs by 34%, 10%, 7%, and 11% compared to the best solutions achieved by PSO, GWO, SOO, and SFOA, respectively. Similarly, for the four-story with 428-member steel frame, the optimized design generated by the ISFOA suggested lighter designs by 42%, 17%, 9%, and 12%, for PSO, GWO, SOO, and SFOA, respectively. The ISFOA complied with displacement and geometric constraints according to the LRFD-AISC standard.