Published in last 50 years
Articles published on Standard Uncertainty
- New
- Research Article
- 10.1186/s12958-025-01478-w
- Nov 1, 2025
- Reproductive Biology and Endocrinology : RB&E
- Julia Lastinger + 8 more
BackgroundChronic endometritis (CE) is frequently diagnosed in women with repeated implantation failure (RIF) and recurrent pregnancy loss (RPL), yet the lack of standardized diagnostic criteria and uncertainty about the timing of assessment and optimal treatment lead to open questions regarding its clinical relevance. This study aims to identify clinical risk factors that may guide targeted CE testing and to evaluate an ideal time point in the diagnostic pathway to offer CE assessment in women with RIF and RPL.MethodsIn this retrospective cohort study, 392 women with RIF (no pregnancy after two or more transfers of good quality blastocysts) and 119 women with RPL (two or more subsequent miscarriages) who underwent endometrial biopsy with CD138 immunohistochemistry between 2016 and 2024 were analyzed. Odds ratios for presumed CE risk factors were calculated and CE prevalence and reproductive outcomes were assessed.ResultsWomen in the RPL group had a higher prevalence of CE compared to the RIF group (39.5 vs. 25.0%, p = 0.004). A history of cesarean delivery was associated with increased CE risk in the RPL group (OR 2.5, 95% CI 1.14–7.84). CE prevalence did not increase with the number of failed embryo transfers in RIF (32.2% after two, 21.3% after three, 24.7% after ≥ 4 transfers; p = 0.349) or miscarriages in RPL (44.2% after two, 51.4% after three, 57.1% after ≥ 4 miscarriages; p = 0.518). When RIF patients treated for CE were compared with those with normal biopsy after both two and three previous embryo transfers, we found no differences in pregnancy outcomes.ConclusionsOur data did not confirm a significant increase in CE prevalence with an increasing number of failed embryo transfers or miscarriages. No relevant differences in the reproductive outcomes of RIF patients with normal biopsies compared with treated CE were found. While prior cesarean delivery may identify a subgroup of RPL patients who could benefit from targeted screening, the overall utility of routine CE testing and treatment remains limited. Standardized diagnostic criteria and further prospective studies are needed to clarify the role of CE in reproductive outcomes, with cautious consideration of uncritical antibiotic treatment.
- New
- Research Article
- 10.1016/j.apradiso.2025.112012
- Nov 1, 2025
- Applied radiation and isotopes : including data, instrumentation and methods for use in agriculture, industry and medicine
- W M Van Wyngaardt + 6 more
Standardisation of iodine-123 and participation in the comparison BIPM.RI(II)-K4.I-123.
- New
- Research Article
- 10.1007/s10953-025-01519-3
- Oct 16, 2025
- Journal of Solution Chemistry
- W Earle Waghorne + 1 more
Abstract A strategy for determining consensus solubilities with measurement uncertainties at common temperatures from data reported at varying temperatures is presented. The approach involves fitting the individual data sets to an appropriate equation of solubility (or log solubility) as a function of temperature. This allows estimation of the standard uncertainty of each data point and, if required, interpolated values of the solubility, with uncertainties, at the temperatures of interest. Between-laboratory uncertainty is assessed by the DerSimonian–Laird procedure. The strategy is applicable to any measured quantity where its variation with experimental variables can be represented by a robust equation.
- New
- Research Article
- 10.1186/s12985-025-02954-w
- Oct 15, 2025
- Virology Journal
- Manyu Li + 12 more
BackgroundViral antigen rapid detection tests (Ag-RDTs) and PCR-based nucleic acid amplification tests (NAATs) are essential for diagnosing infections like SARS-CoV-2. However, unlike nucleic acids which can be quantified precisely in copies, viral antigens lack a standardized unit of measurement, hindering precise analytical evaluation of Ag-RDTs and comparison with NAATs.MethodsWe established a universal national standard for SARS-CoV-2 antigen and nucleic acid based on an Omicron BA.1 strain, inactivated using β-propiolactone (BPL). Its concentration was assigned via multi-laboratory digital PCR (dPCR). Clinical samples were tested with Ag-RDTs and NAATs. We also compared the impact of heat and BPL inactivation on detection using dPCR, qPCR, and sequencing. The standard was then used to evaluate the limits of detection (LoDs) of commercial Ag-RDTs and NAATs using a common unitage.ResultsClinical samples’ results showed that antigen positivity was correlated with higher nucleic acid titers. Both BPL inactivation and heat inactivation maintained comparable nucleic acid titers, but BPL inactivation preserved better antigen activity. The national standard concentration was assigned as 1.04 × 108 Unit/mL (standard uncertainty: 3.48 × 106 Unit/mL). Using this standard, NAATs exhibited lower LoDs than Ag-RDTs, though some Ag-RDT sensitivity approached NAAT levels. Most commercial assays met or exceeded their claimed LoDs, with all claimed NAAT LoDs falling within 2-fold of measured values. The standard enabled direct cross-format LoD comparison.ConclusionWe established and validated a BPL-inactivated universal national standard specifically designed to overcome the difficulties in quantifying antigen and enabling direct sensitivity comparison between SARS-CoV-2 Ag-RDTs and NAATs.Supplementary InformationThe online version contains supplementary material available at 10.1186/s12985-025-02954-w.
- Research Article
- 10.24027/2306-7039.3.2025.340615
- Oct 3, 2025
- Ukrainian Metrological Journal
- Valeriy Ashchepkov + 2 more
The paper addresses the challenges associated with applying machine learning models to detect outliers in metrological datasets. While such models ensure the possibility to identify complex deviations in the structure of a sample without relying on prior statistical assumptions, they do not provide normatively justified criteria for assessing the reliability of their decisions. Specifically, such models lack interpretable confidence indicators, metrological traceability, and formalised thresholds to determine whether an outlier is genuine. One proposed solution involves assessing the impact of eliminated anomalous values detected by the Isolation Forest model on the standard measurement uncertainty of Type A when the initial sample size is preserved through repeated measurements. This approach was validated using real-life measurements of liquid flow performed with Coriolis flowmeters of various diameters. The results empirically proved the effectiveness of the method in cases where the elimination of distortion-inducing values led to a significant reduction in measurement variability. However, several limitations were also identified, including the sensitivity of models to small sample sizes, the impracticality of performing repeated measurements in many real-life scenarios, and the lack of an objective threshold to determine the “significance” of uncertainty reduction. These findings highlight the need for further study of the formalization of confidence criteria in anomaly detection within the metrological domain, particularly in the context of compliance with international standards such as ISO/IEC 17025. Despite these limitations, the application of machine learning models opens new opportunities for automating the analysis of metrological data and highlights the need to develop harmonized approaches for integrating such solutions into the regulatory framework.
- Research Article
- 10.1088/1681-7575/ae012f
- Oct 1, 2025
- Metrologia
- Haiyang Zhang + 4 more
Abstract The virial coefficients of cryogenic gases, especially helium-4 and helium-3, are playing an ever more critical role in the establishment of primary reference standards for temperature after the redefinition of the kelvin in the SI. Thus, the reliability of the values and uncertainties of these coefficients, especially those of the second, third, and even fourth density virial coefficients ( B, C and D ), has become more significant. To check the accuracy of these coefficients for helium-4 from ab initio calculations, the refractive-index gas thermometry (RIGT) method was developed, allowing for the simultaneous determination of thermodynamic temperatures and density virial coefficients. Using this technique, highly accurate experimental values of B, C and D for helium-4, as well as T – T 90 values, were obtained for the range 5 K–25 K. Direct comparisons with the ab initio calculation density virial coefficients for helium-4 were conducted, revealing excellent agreement. Furthermore, good agreements of thermodynamic temperatures T between absolute RIGT and our previous single pressure RIGT (Gao et al 2021 Metrologia 58 059501) were achieved at temperatures from 5 K to 25 K, with differences within each standard uncertainty. This further strengthens our confidence in the comparisons made in this work. It is foreseeable that the rigorously verified ab initio calculations of the density virial coefficients for helium-4 will continue to be used to improve the measurement accuracy of helium-based primary reference standards for temperature and pressure.
- Research Article
- 10.1007/s00134-025-08100-y
- Oct 1, 2025
- Intensive care medicine
- Alessandra Agnese Grossi + 5 more
Donation after circulatory death (DCD) represents a valuable opportunity to expand the organ donor pool. However, its implementation in intensive care units (ICUs) remains ethically and emotionally complex. ICU healthcare professionals (HCPs) play a pivotal role in this process, yet their attitudes, knowledge, and perceived challenges are not fully understood. This systematic review aimed to explore ICU HCPs' attitudes (as defined by Rosenberg and Hovland) toward controlled DCD (cDCD). We conducted a systematic review of studies published until March 2025 in four databases. Eligible studies included original research reporting ICU-specific data on HCPs' attitudes toward DCD. Study quality was assessed using the Mixed Methods Appraisal Tool. A structured narrative synthesis was performed. Twenty-five studies involving 3,878 HCPs were included. Overall, support for DCD was evident though it remained lower than for donation after brain death. Ethical concerns focused on potential conflicts of interest between the withdrawal of life-sustaining treatment and the pursuit of organ donation, the timing of withdrawal, the urgency of organ retrieval, and the challenge of balancing compassionate end-of-life care with procedural imperatives. Common barriers included the lack of standardized protocols, insufficient training, and uncertainty surrounding death determination. While ICU HCPs generally support DCD, significant ethical tensions and systemic barriers persist. Institutional efforts should focus on implementing clear protocols, promoting interprofessional education, and providing emotional support to ensure ethical integrity and staff well-being. Future research should explore differences in attitudes between uDCD and cDCD and work toward the development of validated tools to assess professional attitudes.
- Research Article
- 10.1016/j.apradiso.2025.112240
- Oct 1, 2025
- Applied radiation and isotopes : including data, instrumentation and methods for use in agriculture, industry and medicine
- J.T Cessna + 6 more
Primary and secondary activity standards for 89Zr.
- Research Article
- 10.1002/jssc.70308
- Oct 1, 2025
- Journal of separation science
- Wafaa El-Ghaly + 9 more
The quality of quantitative results in bioanalysis requires not only a validated analytical method but also a rigorous estimation of measurement uncertainty. This study examines the challenges associated with the implementation of two distinct approaches in equine anti-doping control for the assessment of uncertainty associated with an ultra-high-performance liquid chromatography-high resolution mass spectrometry quantitative method for caffeine and lidocaine in horse urine. The bottom-up approach, based on the ISO Guide to the Expression of Uncertainty in Measurement (ISO GUM), was compared to the top-down approach using β-content, γ-confidence tolerance intervals (β,γ-CCTI) via F-test. The key limitation of the ISO GUM method was accurately quantifying the various uncertainty components; it gives standardized uncertainty estimates but requires detailed assumptions and modeling about error sources. The direct application of the GUM method imposes the beforehand correction of the matrix effect to provide reliable results. Parallelly, the chemometric approach β,γ-CCTI offers more flexible and realistic estimations. Four combinations of β and γ were investigated to assess their influence on uncertainty interval width: β = 66.7% and 80%; γ = 90% and 95%; and the method was evaluated under repeatability and intermediate precision conditions through the use of advanced computation that adjusts for matrix effects and proves more straightforward for capturing variability inherent in experimental data. The top-down approach is a reliable alternative for routine use and, particularly, for ensuring compliance with regulatory requirements, with the fact that a known proportion β of future results will be within predefined acceptance limits.
- Research Article
- 10.1088/1681-7575/ae10c1
- Oct 1, 2025
- Metrologia
- Harim Lee + 8 more
Abstract Probe current stability in a scanning electron microscopy is a critical factor related to the realization of high-precision imaging and accurate nanoscale metrology, including critical dimension measurements and depth estimations. While prior research has predominantly focused on noise sources and long-term drift, systematic quantifications of sensitivity coefficients and measurement uncertainties across various electron gun configurations remain insufficient. Without a clear understanding of how structural parameters such as the tip radius and extractor gap affect probe current stability, optimizing electron sources for advanced metrology continues to be a challenge. Moreover, the lack of experimental validation hinders reliable uncertainty analyses. In this study, we evaluate both short-term and long-term probe current stability levels for thermal field emission and cold field emission sources using Allan deviation and autocorrelation function analyses. Sensitivity coefficients with respect to the extraction voltage are systematically simulated over a range of tip radius and extractor gap distances, revealing direct correlations between geometric design parameters and current stability levels. Experimental measurements are combined with simulation results to quantify the combined standard uncertainty and expanded uncertainty of the probe current for various electron gun configurations. These results establish a clear correlation between emission structure parameters and probe current variations, providing a practical basis for the optimization of electron-optical systems in SEM applications for improved stability and metrology performance outcomes.
- Research Article
- 10.24027/2306-7039.3.2025.340414
- Sep 30, 2025
- Ukrainian Metrological Journal
- Oleg Novoselov
The paper considers the features of the evaluation of the component of standard measurement uncertainty of type B as a result of a correction due to the finite resolution of the readings of measuring equipment during calibration. The problem of carrying out the procedure for establishing the compliance of the measuring equipment with the technical specification based on the results of its calibration is stated. It is emphasized that the procedure shall necessarily account for the measurement uncertainty. In this regard, the metrological characteristics of the measuring equipment after calibration may not meet the requirements of the technical specification. The calibration and measurement capabilities of accredited laboratories when calibrating a smooth micrometer MK25 for compliance with the technical specification according to the results of its calibrations are analysed. It is shown that the component of standard measurement uncertainty of type B as a result of a correction due to the resolution of the readings of analogue measuring instruments cannot be dominant in the uncertainty budget. It is proposed to consider the establishment of a target uncertainty during calibration for the analysis of the calibration and measurement capabilities of measuring equipment in the laboratory. At the same time, the reliability of the evaluation of measurement uncertainty during the calibration of measuring equipment depends on the correct choice of appropriate procedures. Therefore, at the legislative level, namely in the Law of Ukraine “On Metrology and Metrological Activities”, it is established that the measurement equipment results are calibrated and registered in accordance with national standards harmonized with relevant international and European standards, and documents approved by international and regional metrology organizations.
- Research Article
- 10.1175/wcas-d-25-0039.1
- Sep 30, 2025
- Weather, Climate, and Society
- Liam Thompson + 4 more
Abstract Uncertainty is inherent to all sciences and can be studied from several different perspectives. However, best practices for climate scientists communicating uncertainty in climate projections are unclear. As anthropogenic greenhouse gas emissions continue to rise, the impacts of human-activity on the climate system have become more apparent. This makes the communication of uncertainty in the climate projections critical to decision-makers. Further, the public often equates science to certainty. Yet, it is critical to understand that climate projections are not the future and that projections themselves contain several sources of uncertainty. As such, this review makes four primary recommendations to guide future research and considerations when communicating uncertainty. The goal is to provide a central place for climate scientists to have crucial conversations on how to communicate this uncertainty to facilitate decision-making. First, a standardized uncertainty communication framework, specific to climate projections, should be developed and implemented. Second, research is needed to determine how to communicate which climate projections are best representing current reality. This is critical given recent funding uncertainties that could impact the ability to make more certain climate projections. Third, there is a lack of research on how different decision-makers perceive uncertainty, which could point to a need to develop industry-specific uncertainty communication practices when developing a larger, universal uncertainty communication framework. Finally, we recommend investigating whether too much emphasis is placed on the potential impacts of climate change (negative framing of uncertainty) rather than on actions the public can take to reduce the projected warming (positive framing).
- Research Article
- 10.3390/e27100999
- Sep 25, 2025
- Entropy
- Jasper A Vrugt + 1 more
A fundamental limitation of maximum likelihood and Bayesian methods under model misspecification is that the asymptotic covariance matrix of the pseudo-true parameter vector is not the inverse of the Fisher information, but rather the sandwich covariance matrix , where and are the sensitivity and variability matrices, respectively, evaluated at for training data record . This paper makes three contributions. First, we review existing approaches to robust posterior sampling, including the open-faced sandwich adjustment and magnitude- and curvature-adjusted Markov chain Monte Carlo (MCMC) simulation. Second, we introduce a new sandwich-adjusted MCMC method. Unlike existing approaches that rely on arbitrary matrix square roots, eigendecompositions or a single scaling factor applied uniformly across the parameter space, our method employs a parameter-dependent learning rate that enables direction-specific tempering of the likelihood. This allows the sampler to capture directional asymmetries in the sandwich distribution, particularly under model misspecification or in small-sample regimes, and yields credible regions that remain valid when standard Bayesian inference underestimates uncertainty. Third, we propose information-theoretic diagnostics for quantifying model misspecification, including a strictly proper divergence score and scalar summaries based on the Frobenius norm, Earth mover’s distance, and the Herfindahl index. These principled diagnostics complement residual-based metrics for model evaluation by directly assessing the degree of misalignment between the sensitivity and variability matrices, and . Applications to two parametric distributions and a rainfall-runoff case study with the Xinanjiang watershed model show that conventional Bayesian methods systematically underestimate uncertainty, while the proposed method yields asymptotically valid and robust uncertainty estimates. Together, these findings advocate for sandwich-based adjustments in Bayesian practice and workflows.
- Research Article
- 10.1002/mp.70022
- Sep 25, 2025
- Medical Physics
- Seongmoon Jung + 11 more
BackgroundUltra‐high dose rate (UHDR) radiotherapy, or FLASH RT, has shown potential to spare normal tissues while maintaining tumor control. However, accurate dosimetry at UHDR remains challenging, as conventional ionization chambers suffer from recombination effects. Although radiochromic films and alanine dosimeters have both been investigated independently for FLASH dosimetry, their separate use hinders robust validation and direct comparison of their measurements.PurposeThis study aims to develop and evaluate a unified dosimeter containing both alanine and radiochromic film for electron and proton FLASH beam dosimetry. The design allows for simultaneous, co‐located irradiation of both dosimeter types, enabling a direct comparison between them. This configuration eliminates confounding factors such as positional offsets, alignment errors, and beam fluctuations, thereby facilitating the validation of measurements and enhancing confidence in FLASH dosimetry.MethodsThe unified alanine and EBT‐XD/HD‐V2 film dosimeter was designed with the same outer dimensions as the Advanced Markus chamber (PTW‐Freiburg), allowing compatibility with commercial QA phantoms. Alanine and film dosimeters were calibrated under conventional electron and proton beams, traceable to absorbed dose to water from Co‐60 gamma rays. The unified dosimeter was used to measure dose from a 9 MeV electron FLASH beam (Varian Clinac iX) and a 230 MeV proton FLASH beam (IBA machine), with alanine and film irradiated simultaneously at the same location.ResultsThe alanine dosimeter measured the dose per pulse, instantaneous dose rate, and mean dose rate at a source‐to‐surface distance of 100 cm for the electron FLASH beam as 0.99 ± 0.02 Gy/pulse, 2.48 × 105 Gy/s, and 357 Gy/s, respectively. The EBT‐XD film showed good agreement (within a 2.0% relative difference) in the 10–30‐Gy range, whereas the HD‐V2 indicated a larger difference (up to 5.9%) compared to the alanine dosimeter. The mean dose rate for the proton FLASH beam, measured by the alanine dosimeter, was 115.4± 1.1 Gy/s. The EBT‐XD showed a 4.3% relative difference with the alanine dosimeter in the 10–30‐Gy range.ConclusionsThe unified alanine and film dosimeters enabled simultaneous irradiation of the alanine and the films, with combined relative standard uncertainties of 2.4% (k = 1) for the alanine dosimeter and 3.5% (k = 1) for the EBT‐XD films at the electron FLASH beam. For the proton FLASH beam, these uncertainties were 3.2% (k = 1) for both the alanine dosimeter and the EBT‐XD films. Until dosimetry guidelines for the FLASH RT community are established by a working group such as AAPM TG‐359, the dosimetry protocol proposed in this study can serve as a promising approach for FLASH RT facilities worldwide.
- Research Article
- 10.1007/s10765-025-03608-3
- Sep 18, 2025
- International Journal of Thermophysics
- Karim S Al-Barghouti + 3 more
Abstract The speeds of sound of ternary refrigerant mixtures, namely, R-444A (difluoromethane (R-32)/1,1-difluoroethane (R-152a)/trans-1,3,3,3-tetrafluoropropene (R-1234ze(E)) with respective mass fractions of 0.1194/0.0519/0.8287), R-457B (R-32/2,3,3,3-tetrafluoropropene (R-1234yf)/R-152a with respective mass fractions of 0.3489/0.5495/0.1016), and R-407C (R-32/pentafluoroethane (R-125)/R-152a with respective mass fractions of 0.5178/0.2480/0.2342), were measured using a dual-path pulse-echo technique at temperatures ranging between 230 K and 345 K and pressures between 0.14 MPa and 30 MPa. The standard uncertainties in temperature and pressure were 5 mK and 0.014 MPa, respectively. The average combined expanded uncertainty for all speed of sound data was 0.07%. Greater uncertainties were encountered as the system approached the critical regions where the speed of sound is more sensitive to changes in pressure. The experimental speed of sound data was used to assess the predictive capabilities of default REFPROP v10.0 mixture models with binary interaction parameters fit using mainly vapor–liquid equilibria and/or density data. We quantify the improvements for ternary mixture predictions when using updated binary interaction parameters that included speed of sound data in the fitting procedure. Reductions of 1.64%, 1.50%, and 0.11% in the average absolute deviations for R-444A, R-457B, and a R-125/1234yf/152a (0.3521/0.5465/0.1014 mass composition) mixture, respectively, are obtained with the updated binary interaction parameters. Further improvements to the mixture models could be made by refitting both the pure component equations of state and interaction parameters of certain hydrofluorocarbon binary pairs.
- Research Article
- 10.1088/1361-6498/ae0597
- Sep 1, 2025
- Journal of Radiological Protection
- Tomoya Tsuji + 4 more
In response to the new operational quantities proposed in ICRU Report 95, we calculated conversion coefficients for monoenergetic photon calibration fields-specifically, the241Amγ-ray calibration field and the fluorescence x-ray calibration field-both of which are listed in the annex of the ISO 4037 standard series. These coefficients were derived using the measured photon fluence spectrum. Additionally, correction factors for air density were determined for the low-energy fluorescence x-ray calibration field. Both the conversion coefficients and the air density correction factors were found to vary within the standard uncertainty specified by the ISO 4037 series.
- Research Article
- 10.1002/cepa.3328
- Sep 1, 2025
- ce/papers
- Elisabeth Stierschneider + 3 more
Abstract The product qualification of bonded fasteners is regulated in a European Assessment Document, where a comprehensive test program to evaluate the sensitivity of bonded fasteners to different installation and environmental conditions is included. Sustained load testing as part of this program is used for the long‐term displacement forecast to assess the creep behaviour of the anchors over time. The prescribed standard procedure for the assessment of sustained load tests is the Findley power‐law methodology. The measured displacements are extrapolated to a working life of 50 years and compared with the displacement at loss of adhesion as limiting value. As a result of this power‐law methodology, also the inherent measurement uncertainty of the displacements is extrapolated over time. The first step to quantify this influence is the determination of the measurement uncertainty for the considered sustained load testing task and the used equipment for a specific sustained load data set. Based on Monte‐Carlo simulations with the measured displacement and the combined standard uncertainty of the testing task as input parameters, a high number of displacement curves is generated. By analysing the scatter of the calculated displacement extrapolations for 50 years, the influence of the measurement uncertainty is quantified for the considered data set.
- Research Article
- 10.1016/j.measurement.2025.117589
- Sep 1, 2025
- Measurement
- Sergey N Grigoriev + 4 more
Analysis of standard uncertainty using the Monte Carlo method for arc measurement on a coordinate measuring machine
- Research Article
- 10.1088/1361-6501/adf7ca
- Aug 26, 2025
- Measurement Science and Technology
- Yunyi Chen + 2 more
Abstract Effective metrology of collector grating structures is crucial for ensuring high spectral purity in extreme ultraviolet lithography (EUVL) light sources. In this paper, we propose a physics-informed unsupervised neural network for parametric analysis and reconstruction via scatterometry evaluation (PUNN-PARSE) of EUVL collector gratings. PUNN-PARSE combines scalar diffraction theory constraints with unsupervised neural networks to directly reconstruct key dimensional parameters from reflectance spectra, without the need for off-line model training or extensive dataset preparation. Under a signal-to-noise ratio of 400:1, the proposed method achieves remarkably low relative errors of 0.218% for height and 0.620% for top angle of deformed trapezoidal gratings, which is sufficient for the measurement requirements for EUV collector gratings. What’s more, the relative standard uncertainties are as low as 0.00376% and 0.0178% for the two parameters, respectively, indicating high data consistency and confidence in the reconstruction results. In addition, the consumption time of PUNN-PARSE is about 10 times faster than conventional methods for inline reconstruction. In a summary, PUNN-PARSE not only enhances the accuracy and speed of reconstruction, but also maintains physical interpretability, making it promising for the inline measurement of gratings’ dimensional parameters.
- Research Article
- 10.1101/2025.04.15.25325868
- Aug 23, 2025
- medRxiv
- Shailesh Alluri + 16 more
Objective:To explore new strategies to make the document selection process more transparent, reproducible, and effective for the active learning process. The ultimate goal is to leverage active learning in identifying keyphrases to facilitate ontology development and construction, to streamline the process, and help with the long-term maintenance.Methods:The active learning pipeline used a BILSTM-CRF model and over 2900 abstracts retrieved from PubMed relevant to clinical decision support systems. We started the model training with synthetic labeled abstracts, then used different strategies to select domain experts’ annotated abstracts (gold standards). Random sampling was used as the baseline. Recall, F1 (beta = 1, 5, and 10) scores are used as measures to compare the performance of the active learning pipeline by different strategies.Results:We tested four novel document-level uncertainty aggregation strategies—KPSum, KPAvg, DOCSum, and DOCAvg—that operate over standard token-level uncertainty scores such as Maximum Token Probability (MTP), Token Entropy (TE), and Margin. All strategies show significant improvement in early active learning cycles (θ0 to θ2) for recall and F1. The systematic evaluations show that KPSum (actual order) shows consistent improvement in both recall and F1 and KPSum (actual order) shows better results than the random sampling results. The document order (actual versus reverse) does not seem to play a critical role across strategies in model learning and performance in our datasets, although in some strategies, actual order shows slightly more effective results. The weighted F1 (beta = 5 and 10) provided complementary results to raw recall and F1 (beta = 1).Conclusion:While prior work on uncertainty sampling typically focuses on token-level uncertainty metrics within generic NER tasks, our work advances this line of research by introducing a higher-level abstraction: document-level uncertainty aggregation. With a human-in-the-loop Active Learning pipeline, it can effectively prioritize high-impact documents, improve early-cycle recall, and reduce annotation effort. Our results show promise in automating part of ontology construction and maintenance work, i.e., monitoring and screening new publications to identify candidate keyphrases. However, future work needs to improve the model performance to make it usable in real-world operations.