Articles published on Percentage Of Agreement
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
5784 Search results
Sort by Recency
- New
- Research Article
- 10.1136/bmjhci-2025-101780
- Mar 10, 2026
- BMJ health & care informatics
- Boyang Qu + 9 more
To evaluate the ability of large language models (LLMs) to simulate multidisciplinary team (MDT) decision-making in colorectal cancer, a malignancy that often requires complex treatment planning. We retrospectively analysed 1423 colorectal cancer cases discussed at MDT meetings at Peking University Cancer Hospital between January 2023 and December 2024. Three LLMs-OpenAI o3-mini-2025-01-31, DeepSeek-R1 671b and Qwen qwq-plus-2025-03-05-were tested for their ability to replicate MDT recommendations using a standardised treatment categorisation framework. Each case was processed three times per model; only cases with consistent outputs across all three runs were included. Concordance between AI-generated decisions and expert MDT consensus was assessed using agreement percentages and Cohen's kappa. O3 demonstrated the highest intramodel stability, with an agreement rate of 81.0% (Fleiss' kappa=0.794), yielding 1153 cases with consistent outputs. Concordance with MDT consensus was comparable across the three models, ranging from 62.5% to 65.4%. Multivariable analysis of O3 outputs identified treatment-naïve status, non-metastatic disease and colon tumour location as independent predictors of higher concordance with experts. LLMs showed fair overall agreement with expert MDT decisions, with stronger performance in standardised and less complex clinical scenarios. Areas of higher concordance included treatment-naïve non-metastatic colon cancer, treated non-metastatic rectal cancer and treated non-metastatic colon cancer. LLMs can partially replicate expert MDT recommendations in colorectal cancer. Their integration into clinical workflows should aim to complement, rather than replace, human expertise.
- New
- Research Article
- 10.1016/j.injury.2026.113056
- Mar 1, 2026
- Injury
- Andrea Audisio + 8 more
Pilot validation study for a large image database of proximal femur fracture anteroposterior radiographs: Searching for the ground truth.
- New
- Research Article
- 10.1177/19322968261424270
- Feb 28, 2026
- Journal of diabetes science and technology
- Asta Risak Johansen + 4 more
Type 1 diabetes mellitus (T1D) requires precise carbohydrate estimation to manage blood glucose and prevent chronic and acute complications to hyperglycemia or hypoglycemia. This study evaluates the accuracy of ChatGPT in estimating carbohydrate content in images of meals, compared with the considered gold standard of manually counting carbohydrates. Carbohydrate content of 60 fruits and vegetables (F&V) and 60 composite meals was manually counted as the reference standard. Images (n = 240), with and without a size reference, were uploaded to ChatGPT-4o with a standardized prompt in separate sessions. ChatGPT's estimates were then compared with the manual counts to assess accuracy. The performance of ChatGPT-4o compared with the manual calculation was assessed primarily using mean absolute error, percentage of agreement (PoA), and Bland-Altman analysis. ChatGPT-4o achieved a PoA of 93.3% for F&V's estimates, increasing to 95% with a size reference, while composite meal estimates yielded a PoA of 46.7%, reducing to 43.3% with a size reference, based on a ±10 g carbohydrates limit. Bland-Altman analysis showed a slight bias tendency in both ChatGPT-4o's estimates of F&V and composite meals with a size reference. ChatGPT-4o's estimate for F&V and composite meals without a size reference exhibited a systematic bias, with both overestimation and underestimation of the carbohydrate content. This study suggests that adolescents living with T1D should employ ChatGPT-4o for carbohydrate estimating with caution. ChatGPT-4o showed inaccuracies in its application to composite meals, increasing the risk of inaccurate insulin administration and potentially causing postprandial hyperglycemia or hypoglycemia.
- New
- Research Article
- 10.1007/s10554-026-03617-9
- Feb 18, 2026
- The international journal of cardiovascular imaging
- Aleksandra Tuleja + 11 more
Standardized angiographic endpoints for evaluating treatment success after thrombectomy in acute limb ischemia (ALI) are lacking, limiting comparability across trials and clinical practice. We developed the Thrombectomy in Limb Ischemia (TILI) score, a novel angiographic classification system to systematically assess the technical efficacy of thrombectomy in ALI of embolic origin. The TILI Score was designed through consensus by a Swiss-wide research group of specialists in angiology, vascular surgery, and interventional radiology. It comprises of two components: (1) lesion recanalisation (Grades 0-3), and (2) peripheral embolisation (Grades a-c, ±p), the latter assessed only if recanalisation is Grade 2 or 3. For the pilot validation, inter-reader reproducibility, 10 expert readers were asked to grade 10 representative post-thrombectomy angiograms after standardized training. Agreement was quantify using percentage agreement, Gwet's agreement coefficient (AC2; quadratic and ordinal weights), and intra-class correlation coefficients (ICC). Nine readers completed the assessment. For the full composite scale agreement reached 93.6% (95% CI: 91.2-96.1), with substantial agreement reliability (Gwet's AC2 = 0.74 and 0.72) and ICC of 0.756 (95% CI: 0.449-0.886). For the main recanalisation grade (0-3), agreement was higher: 95.1% (95% CI: 92.1-98.2), with almost perfect agreement reliability (AC2 of 0.875 and 0.862). The TILI Score is the first structured and reproducible tool to classify the technical success of thrombectomy in ALI in patients without preexisting occlusive disease. It demonstrated substantial to almost perfect interobserver agreement among experts and may serve as a standardized endpoint for future thrombectomy trials. Broader clinical validation is necessary to define outcome-relevant thresholds.
- New
- Research Article
- 10.1158/1557-3265.sabcs25-ps4-11-12
- Feb 17, 2026
- Clinical Cancer Research
- A Longobardi + 8 more
Abstract Background: Balancing the clinical benefit and tolerability of adjuvant systemic therapies in patients with early breast cancer (eBC) remains challenging. Frameworks such as the American Society of Clinical Oncology Value Framework (ASCO VF) v2.0 and the European Society for Medical Oncology Magnitude of Clinical Benefit Scale (ESMO-MCBS v2.0) have been developed to help clinicians interpret trial results by weighing potential clinical benefit and toxicity. The European Society for Medical Oncology has recently revised ESMO-MCBS to version 2.0, which refines benefit thresholds and introduces explicit annotations for acute (AT) and persistent (PT) toxicity. This analysis applied these frameworks to evaluate the benefit-risk profiles of therapeutic strategies for high-risk eBC. Methods: Twelve phase II-III trials (KATHERINE, APHINITY, ExteNET, PEONY, NOAH, NeoSphere, KEYNOTE-522, OlympiA, CREATE-X, monarchE, NATALEE, and GIM2) were evaluated using ESMO-MCBS v2.0 and the ASCO Value Framework. Data were extracted from the latest available publications. Key endpoints included invasive disease-free survival (iDFS), overall survival (OS), pathological complete response (pCR), treatment-related adverse events (TRAEs), and quality of life (QoL). According to ESMO MBCS v2.0, standardized AT annotations for ≥30% G≥3 AEs, or ≥10% premature treatment discontinuation or hospitalization, and PT annotations for ≥20% persistent grade ≥3 toxicity, were retrieved. Concordance between ESMO MCBS and ASCO net health benefit (NHB) scores was assessed using Cohen’s Kappa coefficient. Results: The trials KATHERINE, NOAH, PEONY, KEYNOTE-522, CREATE-X, OlympiA, monarchE, and NATALEE achieved the highest ESMO-MCBS scores (Grade A) owing to improvements in iDFS and quality of life. In contrast, NeoSphere, APHINITY, and ExteNET were rated Grade C due to more modest absolute benefit. Notably, KATHERINE, KEYNOTE-522, OlympiA, CREATE-X, ExteNET, NATALEE, NOAH, and PEONY were annotated for acute toxicity (AT) under ESMO-MCBS v2.0, reflecting substantial rates of severe adverse events or treatment discontinuation. Among ASCO net health benefit (NHB) scores, NOAH (74), OlympiA (72), and CREATE-X (52) recorded the highest values, whereas NATALEE (12), ExteNET (5), and APHINITY (1) had the lowest scores. The APHINITY trial also showed the lowest toxicity score (-20), primarily due to the incidence of grade 3-4 diarrhoea (9.8% with pertuzumab vs. 3.7% with placebo). Overall, concordance between ESMO-MCBS v2.0 and ASCO VF was limited (Cohen’s Kappa -0.143; percentage agreement 33.3%). Conclusions: The revised ESMO-MCBS v2.0 provides a more rigorous assessment of clinical benefit by refining thresholds for absolute gains and systematically evaluating acute and persistent toxicities. NOAH and OlympiA achieved the most favourable benefit-risk balance across both tools. APHINITY and NATALEE recorded the lowest ASCO VF NHB scores, whereas APHINITY, NeoSphere and ExteNET were graded C under the ESMO-MCBS. These results emphasise differences in methodology and support the combined application of both frameworks to guide evidence-based choices in early breast cancer care. Citation Format: A. Longobardi, V. Molinaro, V. Cantile, R. Buonaiuto, A. Caltavituro, G. Crimaldi, M. Giuliano, G. Arpino, C. De Angelis. Impact of the ESMO-MCBS v2.0 on Benefit-Risk Assessment of Emerging Therapies in Early Breast Cancer: A Comparison with the ASCO Value Framework [abstract]. In: Proceedings of the San Antonio Breast Cancer Symposium 2025; 2025 Dec 9-12; San Antonio, TX. Philadelphia (PA): AACR; Clin Cancer Res 2026;32(4 Suppl):Abstract nr PS4-11-12.
- New
- Research Article
- 10.1158/1557-3265.sabcs25-ps4-01-09
- Feb 17, 2026
- Clinical Cancer Research
- M Bakre + 6 more
Abstract Objective: ∼70% of HR+/HER2- early breast cancer have low risk of breast cancer recurrence. Hence, prognostication in these patients to assess chemotherapy benefit is a huge value add. Online tools such as Nottingham prognostic index (NPI), PREDICT etc are often used as they are quick and free. However use of genomic prognostic tests like Oncotype DX (ODX), Mammaprint (MP) etc is increasing. CanAssist Breast (CAB) is a proteomics-based prognostic test that uses an AI-based algorithm to segregate EBC patients as ‘low or high’ risk for recurrence. CAB is validated in global studies and clinically used on ∼10,000 patients to date in the Indian subcontinent, UAE, Turkey, Iran, and Saudi Arabia. Comparative analysis of prognostic tests is important to evaluate the relative performance of different prognostic tests in a population and assess how a new test performs in relation to established, validated ones. We have compared CAB with NPI, PREDICT, ODX, and MP, and here we showcase the results of those comparative studies. Methods: A patient cohort of 1474 from Europe, India and US was used to compare CAB with NPI and PREDICT. NPI risk groups were categorized into three prognostic groups: good (GPG-NPI index ≤ 3.4), moderate (MPG 3.41-5.4), and poor (PPG > 5.4). Patients with chemotherapy benefit of < 2% were classified as low risk and ≥ 2% high risk by PREDICT. CAB uses a cut-off of 15.5 to stratify patients into low risk (≤15.5) and high risk (>15.5) categories. Agreement between CAB and NPI/PREDICT risk groups were assessed by kappa coefficient. Retrospective comparison of risk stratification by CAB with ODX and MP was done with 109 (US and India) and 43 (EU) patients, and prospectively with a total of 116 Turkish patients- 58 patients in each group. Accuracy/ negative predictive value was calculated using MedCalc. Concordance of CAB with ODX or MP was calculated using the overall percentage agreement. Results: Risk proportions generated by all tests were: CAB low:high 74:26; NPI good:moderate:poor prognostic group- 38:55:7; PREDICT low:high 63:37. 65% of NPI-MPG patients were called low risk by CAB. From PREDICT high risk patients, CAB segregated 51% as low risk, thus preventing over-treatment in these patients. Overall, there was a fair agreement between CAB and NPI [κ=0.31(0.278-0.346)] / PREDICT [κ=0.398 (0.35-0.446)], with a concordance of 97% / 88% between CAB and NPI/PREDICT low risk categories. In cohorts with mostly T1N0 patients, NPI and PREDICT segregated more as low risk compared to CAB, suggesting that T1N0 patients with aggressive biology are missed by online tools but not by CAB. Comparison of CAB with ODX retrospectively (n=109) and prospectively (n=58) showed similar low:high risk proportions as 83:17 and 79:21 for CAB; 90:10 and 83:17 for ODX. An overall concordance of 75% and 65%, and low risk concordance of 82% and 77% was observed between both the tests in retrospective and prospective studies. Retrospective (n=43) and prospective (n=58) comparison of CAB with MP showed similar low:high risk proportions of 65:35 and 66:34 for CAB; 56:46 and 52:48 for MP. Overall concordance between the two tests was 75% in the retrospective study and 62% in the prospective study, while low risk concordance was 83% and 77%, respectively. Conclusion: CAB provided unbiased risk stratification across cohorts of various geographies with minimal impact by clinical parameters, whether compared with online tools and tumor based prognostic tests. CAB is useful for all EBC patients and specifically in NPI-MPG and PREDICT high risk patients for making accurate decisions on chemotherapy use. CAB shows good concordance with ODX and MP in low risk categories. This, coupled with high accuracy comparable with ODX, shows that CAB is an excellent, cost-effective, and quick alternative. Citation Format: M. Bakre, T. Durgekar, S. BA, P. Shrivastava, G. Basaran, G. Ozge, T. Korkmaz. How does proteomics-based prognostic testing compare to genomic tests and online clinical risk prediction tools in early breast cancer patients? [abstract]. In: Proceedings of the San Antonio Breast Cancer Symposium 2025; 2025 Dec 9-12; San Antonio, TX. Philadelphia (PA): AACR; Clin Cancer Res 2026;32(4 Suppl):Abstract nr PS4-01-09.
- New
- Research Article
- 10.1158/1557-3265.sabcs25-ps4-05-06
- Feb 17, 2026
- Clinical Cancer Research
- H Rugo + 8 more
Abstract Introduction: Genomic alterations in PIK3CA, AKT1 and PTEN are biomarkers for the AKT inhibitor Capivasertib (Capi), an approved treatment for metastatic BC (mBC) patients who progressed on ≥1 endocrine therapy or recurred during/within 12 months of adjuvant therapy. Several NGS assays include these genes in their panels; however, it is often unclear which types of alterations are evaluated, particularly PTENloss, and how the results compare across platforms. This study aimed to retrospectively compare two commercially available tissue NGS tests for the detection of short variant mutations (mut) in PIK3CA, AKT1, and PTEN, as well as homozygous PTEN copy loss (PTENloss) and PTEN rearrangements (re) in patients with BC: FoundationOne/FoundationOne®CDx (F1CDx®) and Caris NGS tests. Methods: This study included patients with BC who underwent F1CDx tissue comprehensive genomic profiling and had Caris tissue NGS testing. Data was obtained from the U.S.-wide de-identified Flatiron Health and Foundation Medicine real-world clinicogenomic breast database (CGDB), from ∼280 U.S. cancer clinics (∼800 sites of care) between 01/2011 and 12/2024. A comparison between F1CDx on-label Capi alterations and Caris NGS, abstracted from electronic health records (EHR)1, was conducted for patients with both tests performed on tumor tissue specimens collected on the same day. Caris NGS specific mutations are not available, so we could not assess if they were on-label for Capi. Positive percentage agreement (PPA) was calculated using F1CDx as a reference. Caris IHC PTEN protein loss results abstracted from EHR were also assessed when available. Results: A total of 74, 85 and 80 patients had PIK3CA, AKT1, and PTEN NGS results available from both assays, respectively, and 68 patients also had a Caris IHC PTEN result available. For PIK3CAmut, we observed an agreement of 98.6 % (73/74), with 53 (71.6%) negative and 20 (27%) positive for both tests, and 1 patient F1CDx-/Caris+ (100% PPA). For AKT1mut, the agreement was 100% with all cases negative for both tests. For PTEN, a 95% (76/80) agreement was observed, with 75 (93.7%) being negative for both tests, 1 (1.25%) with both positive, and 4 (5%) F1CDx+/Caris- (20% PPA). No PTENloss or PTENre was reported by Caris NGS, while 3 patients (3.7%) were positive for PTENloss and 1 patient (1.25%) for PTENre by FMI (0% PPA). Of 8 patients with a PTEN alteration not reported by Caris NGS test and detected on F1CDx, 6 had an IHC PTEN protein loss reported by Caris. Additionally, 4 patients were identified to have PTEN protein loss by Caris’s IHC PTEN results, but no PIK3CA/AKT1/PTEN alterations were identified by either NGS test and two functional copies of PTEN were detected on F1CDx. Conclusions: 29% (8/28) BC patients with on-label Capi alterations detected with F1CDx were not reported by Caris NGS tissue testing. 89% (8/9) of PTEN alterations were not reported by Caris NGS tissue testing, including all PTENloss and PTENre. 75% (6/8) Caris IHC PTEN results reported PTEN protein loss (not a current Capi indication). Further investigation on variations in reporting of different NGS assays in both tissue and plasma and their ability to identify on-label alternations are needed to understand true differences between testing results. Footnotes1 All data analyzed in this study related to the Caris tissue test is based on reported data only. Citation Format: H. Rugo, A. Schrock, J. Lee, R. Graf, M. Gearing, A. Heilmann, A. Gasco, N. Vasan, J. Quintanilha. Comparative Analysis of PIK3CA, AKT1, and PTEN Reporting Across Commercial NGS Tests in Breast Cancer (BC) [abstract]. In: Proceedings of the San Antonio Breast Cancer Symposium 2025; 2025 Dec 9-12; San Antonio, TX. Philadelphia (PA): AACR; Clin Cancer Res 2026;32(4 Suppl):Abstract nr PS4-05-06.
- New
- Research Article
- 10.1177/19417381251398562
- Feb 17, 2026
- Sports health
- Mostafa Ziaei + 3 more
Chronic ankle instability (CAI) can induce contralateral limb deficits, influencing interlimb asymmetry during athletic tasks. Understanding the magnitude, direction, and individual thresholds of these asymmetries is critical for effective rehabilitation and performance monitoring. CAI-induced contralateral limb deficits significantly influence the magnitude and direction of interlimb asymmetry in jumping and change-of-direction-speed (CODS) tasks. Cross-sectional study. Level 3. Male elite soccer players with (n = 32) and without (n = 38) CAI performed single-leg hop (SLH), single-leg triple hop, modified-505 (Mod505), and 90°-changes-of-direction tests. Paired-sample t tests revealed small-to-moderate differences between dominant and nondominant limbs in both groups (P < 0.05), moderate-to-large differences between injured and contralateral uninjured limbs (P < 0.05), large differences between injured and matched limbs of healthy players (P < 0.05), and small nonsignificant differences between contralateral uninjured and matched limbs of healthy players (P > 0.05). Independent-sample t test revealed asymmetries were significantly higher in all tests (P < 0.05) except for SLH (P > 0.05) in players with CAI. Kappa coefficient showed substantial-to-perfect agreements for players with CAI (κ = 0.71-1.00), and moderate-to-substantial agreements for healthy players (κ = 0.51-0.73), indicating asymmetries favored same limb. Agreement percentages for similar identifications of asymmetry patterns based on individual thresholds derived from intralimb variability revealed that injured players adopted similar patterns in CODS (81.25%), while healthy players adopted similar patterns between SLH and mod505 (76.32%). CAI-induced contralateral limb deficits influenced magnitude and direction of asymmetry, potentially underestimating asymmetry. Asymmetry consistently favors the same limb due to injury and functional similarities; thresholds derived from intralimb variability identify real asymmetry. These findings highlight the importance of considering contralateral limb deficits when interpreting interlimb asymmetries in players with CAI. Rehabilitation programs should address these deficits to optimize performance and reduce injury risk.
- New
- Research Article
- 10.3174/ajnr.a9232
- Feb 14, 2026
- AJNR. American journal of neuroradiology
- Pranjal Rai + 10 more
To evaluate the feasibility and technical performance of integrating a Delay Alternating with Nutation for Tailored Excitation (DANTE) preparation into a deep learning-accelerated, post-contrast T1-SPACE sequence for intracranial vessel wall imaging (IC-VWI). In this retrospective, single-center study, 35 patients (22 women; mean age, 57.9 ± 17.1 years) underwent IC-VWI using post-contrast DL-T1-SPACE with (T1-SPACEDL+DANTE) and without (T1-SPACEDL) a DANTE preparation. Two neuroradiologists independently scored lumen and wall visualization across the arterial segments on a 4-point Likert scale (1: worst to 4: best) and graded venous flow artifacts along the middle cerebral artery (MCA), peri-mesencephalic veins (PMV), deep cerebral veins (DCV), and cortical veins (CV). Intersequence comparisons used cumulative-link mixed-effects models (CLMMs); segments were additionally pooled and analyzed as proximal versus distal. Venous flow artifact scores were compared with paired Wilcoxon tests between sequences and percentage agreement between readers. Exploratory Bland-Altman analysis was also performed for both readers. A total of 556 vessel-segment pairs were analyzed. In CLMM analysis, T1-SPACEDL+DANTE improved lumen scores versus T1-SPACEDL (pooled OR 40.02; 95% CI 24.06-66.57; FDR p<0.001) but reduced wall scores (pooled OR 0.11; 95% CI 0.08-0.14; FDR p<0.001). By anatomic group, lumen ORs were 26.03 (proximal) and 91.93 (distal), and wall ORs were 0.12 (proximal) and 0.04 (distal) (all FDR p<0.001). Venous flow artifacts improved across all analyzed subsites (p<0.001). ±1-point inter-reader concordance was near perfect across analyses. Bland-Altman plots showed negative lumen bias (favoring T1-SPACEDL+DANTE) and positive wall bias (favoring T1-SPACEDL) without consistent proportional bias. Adding DANTE preparation to deep-learning accelerated IC-VWI was associated with fewer flow-related artifacts and a clearer depiction of the vessel lumen, which may support a more accurate assessment of intracranial vasculopathies and aneurysms. Potential gains were accompanied by a modest wall-visualization penalty, which is not unexpected with a flow-suppression pulse.
- New
- Research Article
- 10.1093/jalm/jfag005
- Feb 13, 2026
- The journal of applied laboratory medicine
- Rebecca A Lillis + 3 more
The clinical performance of the VITROS® Immunodiagnostic Products Syphilis Assay was evaluated by comparison with composite results obtained with widely used lipoidal antigen (nontreponemal) and T. pallidum (treponemal) tests. Serum samples were tested from patients presenting for syphilis screening, and in relevant subpopulations including pregnant women, people living with HIV, known serologically positive for syphilis, and medically diagnosed with syphilis. Samples originated from 1710 and 113 patients in the United States and South America, respectively. Results were also compared for VITROS vs the Roche Elecsys® Syphilis immunoassay alone. Positive percentage agreement was ≥98.81% for VITROS and the comparator composite results within and across populations, with a 95% Wilson Score confidence interval of 98.01%-99.94% across the entire intended use population. Negative agreement was ≥90.63%, with 95% Wilson Score confidence interval of 96.93%-98.67% for the entire population. Method comparison between VITROS and Elecsys assays found 99.09% total agreement and a Cohen's kappa coefficient of 0.97. In separate analyses, nonreactivity was observed for VITROS in 197 of 201 (98%) apparently healthy individuals, and positive reactivity was observed in 151/151 (100%) serum samples preselected from patients with medically diagnosed syphilis, indicating high clinical sensitivity of the VITROS Syphilis assay. In addition, specimens preselected as serologically positive showed 100% reactivity with VITROS. These findings support strong clinical performance of the VITROS Syphilis assay for aiding in diagnosis of syphilis, and excellent concordance with the well-established Elecsys Syphilis test.
- Research Article
- 10.1080/08164622.2026.2616029
- Feb 11, 2026
- Clinical and Experimental Optometry
- Vijay Kumar Yelagondula + 5 more
ABSTRACT Clinical relevance Retinoscopy is an important ophthalmic technique for objectively assessing refractive errors and detecting ocular abnormalities, especially in patients unable to undergo subjective refraction. The development and standardisation of retinoscopy rubrics foster a more equitable and consistent learning experience for trainees, ultimately leading to improved patient care. Background Retinoscopy is an essential skill in ophthalmic examination. Despite the use of various rubrics in eye care, standardised rubric for retinoscopy is still unavailable. This study aims to develop and validate a comprehensive rubric for retinoscopy, addressing the lack of standardised assessment tools in optometric and ophthalmic education. Methods The authors developed a retinoscopy rubric, which was assessed for face and content validity by an international group of optometry educators. Cohen’s alpha was calculated to determine interobserver reliability. Results The final rubric consists of 19 individual items for performance criteria. The revised rubric demonstrated face validity, with an overall percentage of agreement of 95.71%. The content validity index (S-CVI/Ave) was 0.99. Most of the individual rubric items showed moderate to strong interobserver reliability. Conclusion The retinoscopy rubric, a validated evaluation tool for objective refraction, demonstrates strong validity and reliability. It holds significant potential to enhance ophthalmic education by providing a consistent and effective framework for assessing retinoscopy skills.
- Research Article
- 10.1007/s00428-026-04398-1
- Feb 11, 2026
- Virchows Archiv : an international journal of pathology
- Sunil S Badve + 8 more
Accurate human epidermal growth factor receptor 2 (HER2) immunohistochemistry (IHC) scoring is crucial to identify patients for HER2-directed therapy; however, validated HER2 scoring guidelines are lacking for non-breast/gastric solid tumors. We investigated concordance in IHC scores from independent pathologists using three different scoring algorithms in non-gastric/breast solid tumors. Whole-slide scans of HER2-stained tumor samples from DESTINY-PanTumor02 (NCT04482309) and a commercial pan-tumor sample set were scored by three independent, board-certified pathologists using American Society of Clinical Oncology (ASCO)/College of American Pathologists (CAP) scoring algorithms for gastric (reference) and breast cancers, and a clinical trial-based algorithm for endometrial cancers (evaluated for endometrial tumors only). The pathologists evaluated 488 samples from multiple solid tumor types. Mean positive percentage agreement (PPA; across pathologists) between the breast and gastric algorithms was higher for samples scored as IHC 3 + or IHC 0 compared with IHC 2 + or IHC 1 + . Inter-pathologist PPA for each algorithm was greatest in samples scored as IHC 3 + and IHC 0. The majority of inter-pathologist pairwise comparisons had Cohen's κ coefficient values > 0.4 when using the gastric or breast algorithm to determine IHC scores, indicating at least moderate agreement between pathologists; Cohen's κ coefficient values were generally lower (range 0.17-0.43) for the endometrial algorithm. ASCO/CAP scoring algorithms for gastric and breast cancer were comparable for identifying HER2 IHC 3 + tumors; lower concordance was observed for IHC 2 + /1 + tumors. These findings highlight a real-world issue of inter-pathologist variability and emphasize a need for greater awareness of best scoring practices.
- Research Article
- 10.1371/journal.pone.0342471
- Feb 6, 2026
- PloS one
- Hemant Mahajan + 9 more
Cardiovascular diseases (CVDs) represent a growing public-health challenge in India, where nearly one in four deaths is CVD-related. Accurate risk stratification underpins targeted prevention, yet laboratory-dependent tools are often impractical in resource-limited settings. The World Health Organization (WHO) and GLOBORISK initiatives both offer non-laboratory-based 10-year CVD risk algorithms alongside their laboratory-based counterparts. We aimed to compare laboratory- and non-laboratory-based WHO and GLOBORISK CVD risk scores, assess their concordance, and examine relationships with sub-clinical atherosclerosis in a rural Indian cohort. We conducted a cross-sectional analysis of 2,465 adults (1,184 men, 1,281 women) aged 40-74 years from the third wave (2010-12) of the Andhra Pradesh Children and Parents Study (APCAPS). Participants with prior CVD were excluded. Ten-year CVD risk was calculated using sex-specific WHO (South Asia) and India-calibrated GLOBORISK models, both laboratory-based (age, sex, smoking, systolic blood pressure, diabetes, total cholesterol) and non-laboratory-based (age, sex, smoking, systolic blood pressure, BMI) algorithms. Categorical agreement was quantified via percentage agreement and quadratic weighted kappa (κ); continuous agreement by Bland-Altman analysis. We also evaluated linear associations between each risk score (categorical and continuous) and three sub-clinical atherosclerosis markers: carotid intima-media thickness (CIMT), pulse-wave velocity (PWV), and augmentation index (AIx), through sex-stratified multi-level linear regression with random intercept at the household level, adjusting for multiple testing (p < 0.01). Median WHO-CVD-risk was 6.0% (IQR 4% - 9%) in men and 3.0% (2% - 4%) in women for both lab and non-lab models; median GLOBORISK-CVD-risk was 12.0% (9% - 16%) for lab-model vs. 15.0% (10% - 16%) for non-lab-model in men and 5.0% (3% - 9%) for lab-model vs. 5.0% (3% - 9%) for non-lab-model in women. Categorical agreement was substantial to almost perfect: WHO κ = 0.82 (overall), GLOBORISK κ = 0.72. Bland-Altman analyses demonstrated mean differences <1% between lab- and non-lab-based scores, though non-lab models underestimated risk by 4.2% in diabetics and 1.2% in participants with total cholesterol ≥200 mg/dL. Both risk scores showed positive, dose-response relationships with CIMT, PWV, and AIx (p-trend<0.001), with each SD increase in CVD-scores associated with clinically meaningful increases in all three markers of sub-clinical atheroscerosis. Non-laboratory-based WHO and GLOBORISK CVD risk scores exhibit high overall agreement with laboratory-based models and correlate strongly with subclinical atherosclerosis in rural India. However, modest underestimation in high-risk subgroups (diabetics, hypercholesterolemia) warrants cautious interpretation. These findings support the feasibility of non-lab risk assessment in resource-constrained settings, while underscoring the need for prospective validation against hard cardiovascular outcomes prior to large-scale implementation.
- Research Article
- 10.47191/etj/v11i02.03
- Feb 6, 2026
- Engineering and Technology Journal
- Moslema Jahan
This study looked into how learning mathematics in a sporting setting affected Year 10 students' views toward the subject. Data was gathered using a closed-ended, self-reported questionnaire with Likert-type comments. Analysis of each statement was done by comparing the percentage of pupils' agreement or disagreement before and after instruction. Students' attitudes in this study made up of their engagement, confidence, and mathematical awareness. According to this study, kids' confidence, awareness of the value of arithmetic, and engagement all raise when they learn in a sporting setting. The consequences for educators, teachers, and researchers are also taken into consideration in this work.
- Research Article
- 10.1177/08445621251414530
- Jan 29, 2026
- The Canadian journal of nursing research = Revue canadienne de recherche en sciences infirmieres
- Marlo Salum + 2 more
Background and PurposeModified Delphi methods are increasingly used to develop healthcare pathways with input from people with lived experience (PWLE) and clinicians/others. However, guidance on consensus analysis in this context remains limited. We examined consensus outcomes across different scoring methods and criteria when participants were treated as a single combined group (Objective 1) versus two distinct groups (Objective 2).MethodsWe conducted a secondary analysis of Round 1 data from a project involving PWLE (N = 8) and clinicians/others (N = 51). To assess agreement on 68 Delphi statements, we applied three methods for scoring percentage agreement that differed in how the middle response on a three-point Likert scale ("approve", "not sure either way", "do not approve") was treated. Method 1 excluded the middle response, methods 2 and 3 grouped the middle response with "do not approve", and "approve", respectively. We compared consensus rates (% of items reaching consensus) using percentage agreement cutoffs of ≥70%, ≥80%, and ≥90% of participants.ResultsConsensus results varied by participants grouping, treatment of middle response categories, and cutoff criteria. Results from the combined group of PWLE and clinicians/others provided a simplified overview consensus outcome. Treating the participants into as separate groups provide nuanced results.ConclusionThe analysis of data can change the results from which to draw conclusions and inform practice. Investigators should consider the alignment of each approach with the goals of their Delphi study.
- Research Article
- 10.1111/ocr.70088
- Jan 28, 2026
- Orthodontics & craniofacial research
- Victor França Didier + 8 more
To evaluate the reliability of the registration of occlusal contacts through intraoral scanning in comparison with those obtained with the aid of carbon paper. The occlusal registration was obtained at the beginning of the orthodontic treatment of 35 patients (23 men and 12 women), aged 15 to 30 years. All patients were scanned with iTero Element (Align Technology, CA, USA), occlusal records were also performed with carbon paper (AccuFilm - 8 μm, USA) and recorded in occlusograms. To verify the agreement between the two methods, the percentages of agreement and disagreement and kappa statistics were applied. There was poor agreement between occlusal contacts recorded by intraoral scanning and those obtained with carbon paper in most of the sample (Kappa value 0.07 to 0.20). Most contacts were registered in the posterior region. It seems that the contacts registered as intense in the iTero, correspond to the contacts with the carbon paper. The occlusal records by means of scanning and carbon paper presented poor agreement, but the association of both methods is indicated for the correct registration of the occlusion.
- Research Article
- 10.3390/ijerph23020141
- Jan 23, 2026
- International journal of environmental research and public health
- Nancy E Oriol + 10 more
This report describes the development and deployment of the Public Health Quality Tool (PHQTool), an online resource designed to help mobile health clinics (MHCs) assess and improve the quality of their public health services. MHCs provide essential clinical and public health services to underserved populations but have historically lacked tools to assess and improve the quality of their work. To address this gap, the PHQTool was developed as an online, evidence-based, self-assessment resource for MHCs, hosted on the Mobile Health Map (MHMap) platform. This report documents the collaborative development process of the PHQTool and presents preliminary evaluation findings related to usability and relevance among mobile health clinics. Drawing from national public health frameworks and Honore et al.'s established public health quality aims, the PHQTool focuses on six aims most relevant to mobile care: Equitable, Health Promoting, Proactive, Transparent, Effective, and Efficient. Selection of the six quality aims was guided by explicit criteria developed through pilot testing and stakeholder feedback. The six aims were those that could be directly implemented through mobile clinic practices and were feasible to assess within diverse mobile clinic contexts. The remaining three aims ("population-centered," "risk-reducing," and "vigilant") were determined to be less directly actionable at the program level or required system-wide or data infrastructure beyond the scope of individual mobile clinics. Development included expert consultation, pilot testing, and iterative refinement informed by user feedback. The tool allows clinics to evaluate practices, identify improvement goals, and track progress over time. Since implementation, 82 MHCs representing diverse organizational types have used the PHQTool, reporting high usability and identifying common improvement areas such as outreach, efficiency, and equity-driven service delivery. Across pilot and post-pilot implementation phases, a majority of respondents agreed or strongly agreed that the tool was user-friendly, relevant to their work, and appropriately scoped for mobile clinic practice. Usability and acceptance were assessed using descriptive statistics, including percentage agreement across Likert-scale items as well as qualitative feedback collected during structured debriefs. Reported findings reflect self-reported perceptions of feasibility, clarity, and relevance rather than inferential statistical comparisons. The PHQTool facilitates systematic quality assessment within the mobile clinic sector and supports consistent documentation of public health efforts. By providing a standardized, accessible framework for evaluation, it contributes to broader efforts to strengthen evidence-based quality improvement and promote accountability in MHCs.
- Research Article
- 10.3390/jcm15020913
- Jan 22, 2026
- Journal of clinical medicine
- Patrycja Szczepańska-Ciszewska + 6 more
Background/Objectives: Cellulite is a common aesthetic condition in women, traditionally assessed using visual inspection and palpation-based scales that are inherently subjective. Therefore, image-based methods that may support standardized severity grading are of growing interest. To evaluate infrared thermography as an imaging-based method for grading cellulite severity and to perform methodological validation of a newly developed thermographic classification scale by comparing it with clinical palpation and anthropometric parameters. Methods: This retrospective, non-interventional study analyzed anonymized clinical and thermographic data from 81 women with clinically assessed cellulite. Cellulite severity was evaluated using the Nürnberger-Müller palpation scale and a newly developed five-point thermographic scale based on skin surface temperature differentials and histogram pattern analysis. The associations between the assessment methods were evaluated using ordinal statistical measures, and agreement was assessed using weighted Cohen's kappa statistics. Results: Thermographic grading demonstrated high agreement with palpation-based assessment, with a percentage agreement of 93.8% and an almost perfect agreement based on weighted Cohen's κ. A strong ordinal association was observed between the methods. Thermography consistently classified a subset of cases as one grade higher compared with palpation. No statistically significant associations were observed between thermographic grade and body mass index or waist-to-hip ratio. Conclusions: Infrared thermography enables image-based grading of cellulite severity and shows a strong concordance with established palpation scales. The proposed thermographic classification provides preliminary methodological validation of an imaging-based grading approach. Further multicenter studies involving multiple assessors and diverse populations are required to assess reproducibility, specificity, and potential clinical applicability.
- Research Article
- 10.3390/diagnostics16020338
- Jan 21, 2026
- Diagnostics
- S M Mazidur Rahman + 10 more
Background/Objectives: Stool-based GeneXpert testing has become a useful approach for diagnosing pediatric pulmonary tuberculosis (PTB). This study compared two stool-processing methods, centrifugation-based processing (CBP) and simple one-step (SOS), for detecting PTB in children using Xpert MTB/RIF Ultra (Ultra). Methods: Children with presumptive PTB were screened cross-sectionally, and stool samples were collected and tested with Ultra using the CBP method from March 2022 to December 2024 across seven divisions of Bangladesh. A subset of stool samples (n = 281) that tested positive (n = 191) and negative (n = 90) by the CBP method were re-tested again with the same sample by Ultra using the SOS method. The results of the Ultra with SOS-processed stool were compared with the CBP method to evaluate overall agreement and detection efficiency across different bacterial burdens. Results: The SOS method detected 97 of 191 CBP-positive samples, resulting in a positive percentage agreement of 50.8% (95% CI: 43.5–58.1). All 90 Ultra-negative stool were also negative by the SOS method, yielding a negative percentage agreement of 100% (95% CI: 96.0–100.0). Overall agreement between the methods was 66.6% (Kappa: 0.398). The SOS method detected 100% of high- (4/4) and medium- (7/7), 97.3% (36/37) of low-, and 83.3% (35/42) of very-low-bacterial-burden samples, but only 14.9% (15/101) of the trace-detected samples that were identified by the CBP method. Conclusions: Stool testing with Ultra using the SOS processing method missed a significant number of the most prevalent form of child TB—the ‘trace-detected’ category identified by the CBP method. For increased detection of childhood TB nationwide, the national program should prioritize the use of Ultra on stool samples processed by the CBP method.
- Research Article
- 10.1308/rcsann.2025.0094
- Jan 20, 2026
- Annals of the Royal College of Surgeons of England
- K Cole + 12 more
Multicentre clinical research collaboratives collect large, generalisable data sets. However, data are often collected by trainees who may lack clinical or academic experience, raising concerns about data quality and potential reporting bias. Validation practices in such studies are variable. This study outlines the methods, feasibility, and outcomes of internal data validation using the CONGRESS database. The multicentre CONGRESS data set of early oesophagogastric cancer was assessed. A random 20% sample of patients was selected to meet a >15% target validation size. Patient, disease and outcome data were re-abstracted from medical records and entered into a validation data set, which was compared with the original database. Cohen's kappa coefficient (κ) and Pearsons corelation (r) were calculated to express the strength of agreement between categorical and continuous variables, respectively. In total, 302 patients (18.1%) from the original CONGRESS database were included in the validation data set and 3,320 data points were compared between data sets (6,640 total). The percentage of exact agreement for variables ranged from 82.5% to 98.7% (median 92.3%, interquartile range 86.3%-95.7%). Nine variables (1,645 of 2,946, 55.8% data points) showed 'almost perfect' agreement (κ or r > 0.8), and five (1,301 of 2,946, 44.2%) showed substantial agreement (κ > 0.6). None showed weak or poor agreement. This study proposes a reproducible framework and benchmarks for validating large collaborative clinical data sets, using the national CONGRESS data set as an example. This approach offers a standard for ensuring reliable, high-quality research outcomes across multicentre databases.