Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Virtual reality volumetric rendering versus cross-sectional imaging for pancreatic cancer resectability assessment: a pilot randomized controlled reader study.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Current imaging assessment for pancreatic cancer resectability demonstrates problematic inter-observer variability, with only fair-to-moderate agreement among experienced raters. Virtual reality technology offers stereoscopic three-dimensional visualization that may improve diagnostic accuracy and agreement. However, optimal visualization strategies for clinical adoption remain unclear. Ten hepatopancreatobiliary surgeons from two high-volume centers were randomized 1:1 to assess twelve contrast-enhanced CT cases using either VR volumetric rendering or CSI. Primary outcomes included inter-rater agreement, diagnostic accuracy against expert reference standard, assessment time, and surgeon confidence. Statistical analysis employed Fleiss’ κ for inter-rater agreement and two-sided Mann–Whitney U tests on surgeon-level summary measures for between-group comparisons. CSI display on 2D screens achieved substantial inter-rater agreement for resectability assessment (κ = 0.609) while VR demonstrated only slight agreement (κ = 0.127). Diagnostic accuracy was superior with CSI (84.7% vs. 79.7%), with the most pronounced difference in resectability determination (83.3% vs. 58.3%, p = 0.033). VR users reported significantly lower confidence (4.85 ± 1.15 vs. 6.32 ± 0.77, p = 0.028). Assessment times were comparable between groups (median 313.5 s vs. 327.5 s, p = 1.00). In this preliminary investigation, our VR visualization strategy demonstrated lower diagnostic accuracy and inter-rater agreement than CSI. However, prior studies suggest that VR systems employing alternative, hybrid visualization approaches may improve inter-rater agreement, indicating that visualization strategy, rather than VR technology per se, is the primary determinant of utility. DRKS00033932 (German Clinical Trials Register), registered prospectively.

Similar Papers
  • Research Article
  • Cite Count Icon 17
  • 10.1097/jsm.0000000000000997
Inter-rater Reliability of the Classification of the J-Sign Is Inadequate Among Experts.
  • Dec 7, 2021
  • Clinical Journal of Sport Medicine
  • Laurie A Hiemstra + 3 more

The purpose of this study was to determine the inter-rater and intra-rater reliability of the symmetry, classification, and underlying pathoanatomy associated with the J-sign in patients with recurrent lateral patellofemoral instability. Blinded, inter-rater reliability study. N/A. Thirty patellofemoral joint experts. Thirty clinicians independently assessed 30 video recordings of patients with recurrent lateral patellofemoral instability performing the J-sign test. Raters documented J-sign symmetry and graded it according to the quadrant and Donell classifications. Raters indicated the most significant underlying pathoanatomy and presence of sagittal plane maltracking. Intra-rater reliability was assessed by 4 raters repeating the assessments. Mean pairwise simple and/or weighted Cohen's kappa were performed to measure inter-rater and intra-rater reliability, as well as calculation of percent agreement. J-sign symmetry demonstrated fair inter-rater reliability (k = 0.26), whereas intra-rater reliability was moderate (k = 0.48). Inter-rater reliability for the quadrant and Donell classifications indicated moderate agreement, k = 0.51 and k = 0.49, respectively, whereas intra-rater reliability was k = 0.79 and k = 0.72, indicating substantial agreement. Inter-rater reliability of the foremost underlying pathoanatomy produced only slight agreement (k = 0.20); however, intra-rater reliability was substantial (k = 0.68). Sagittal plane maltracking demonstrated slight inter-rater agreement (k = 0.23) but substantial intra-rater agreement (k = 0.64). The symmetry, classification, and underlying pathoanatomy of the J-sign demonstrated fair to moderate inter-rater reliability and moderate to substantial intra-rater reliability among expert reviewers using video recordings of patients with recurrent lateral patellofemoral instability. These findings suggest individual raters have a consistent standard for assessing the J-sign, but that these standards are not reliable between assessors. III.

  • Research Article
  • Cite Count Icon 2
  • 10.1007/s00068-024-02673-x
Dual-energy CT in diagnosing sacral fractures: assessment of diagnostic accuracy and intra- and inter-rater reliabilities.
  • Jan 24, 2025
  • European journal of trauma and emergency surgery : official publication of the European Trauma Society
  • Takahiro Oda + 4 more

Evaluating sacral fractures is crucial in fragility fractures of the pelvis. Dual-energy CT (DECT) is considered useful for diagnosing unclear fractures on single-energy CT (SECT). This study aims to investigate the effectiveness of DECT in diagnosing sacral fractures. Thirty cases with suspected sacral fractures underwent SECT, DECT, and MRI. The exams were evaluated by two groups: three inexperienced surgeons (Group I) and three experienced surgeons (Group E). Diagnoses were made initially using SECT (pre-DECT) and then reassessed including DECT (post-DECT). This process was repeated twice. Presence of fractures was determined based on MRI. Sensitivity, specificity, inter-rater and intra-rater reliability, and diagnostic accuracy were calculated. Diagnostic accuracy was statistically compared between two groups. Sensitivity was 0.73 in pre-DECT and 0.9 in post-DECT, while specificity was 0.83 in pre-DECT and 0.91 in post-DECT. Sensitivity significantly improved with the addition of DECT (McNemar test: p < 0.001). Intra-rater reliability (Fleiss' kappa coefficient) was 0.44 in pre-DECT and 0.76 in post-DECT. Inter-rater reliability (Cohen's kappa coefficient) was 0.6 in pre-DECT and 0.81 in post-DECT. Diagnostic accuracy was significantly lower in group I than group E in pre-DECT (P = 0.019, 0.048), but there was no significant difference between two groups in post-DECT. Combined use of DECT with SECT improved the detection rate of sacral fractures and enhanced intra-rater and inter-rater reliability. High diagnostic accuracy was achieved regardless of the observer's experience. These results indicate that DECT is a useful imaging modality for diagnosing sacral fractures.

  • Conference Article
  • Cite Count Icon 1
  • 10.1117/12.19977
&lt;title&gt;Strategies for scientific visualization: analysis and comparison of current techniques&lt;/title&gt;
  • Aug 1, 1990
  • Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE
  • John A Berton, Jr

STRATEGIES FOR SCIENTIFIC VISUALIZATION:ANALYSIS AND COMPARISON OF CURRENT TECHNIQUESJohn Andrew Berton, Jr.The Ohio Supercomputer Graphics Project1224 Kinnear Road, Columbus, Ohio 43212-1154ABSTRACTThis presentation consists of an overview of scientific images and animation produced recently at van-ous research centers. Strategies of visualization are discussed with respect to these images and the datathey represent. This discussion focuses not only on software issues, such as interactivity and data handling,but also on visual and cognitive issues associated with visualization.Presented images include computational fluid dynamics simulations, meteorological and atmosphericsimulations and recordings, astrophysical simulations, and field recording of natural data. Specific visual-ization techniques under discussion include color contour mapping in two and three dimensions, surfaceand isosurface mapping, volume rendering, and glyph and particle representation.In addition to explanations of techniques and interpretations of data provided by these techniques, dif-ferent strategies within each technique are explored. Comparisons are also made between different strate-gies relative to identical or similar databases.In the process of these explanations and comparisons, some general ideas about visualization are re-vealed. These points are emphasized, as they relate to a specific database, ranges of similar databases, andscientific datain general. Thes.e points are ofinterest to scientists working in visualization, as they indicateefficient and effective routes to the better understanding of large databases.1. INTRODUCTIONThere are as many ways of visualizing scientific data as there are different data sets to visualize. Scien-tists must select from the range of options a visualization strategy which fits the nature of his data, and thenature of his exploration of that data. A strategy for visualization is not just the selection of a method, it isthe orchestration of the hardware and software tools upon which the method relies. Additionally, the suc-cessful execution of the strategy is dependant on an understanding of the strengths and weaknesses of thestrategy vis -a- vis the data under analysis, combined with an informed application of the elements whichdefine the method. In the following pages, some of the more common visualization strategies are dis-cussed, with special attention to how different strategies interpret the same data base, and how the controlof individual elements can enhance or detract from the success of the visualization strategy.2. CATEGORIZING STRATEGIES FOR VISUALIZATIONThe nature of how a data base is visualized is dependant on many factors, including the size of the data,and the ability of the computer system in use to handle data. Visualization strategies can be divided into afew concise groups which define how the scientist interacts with his data. Marshall et. al define these threegroups as post-processing, tracking, and The first of these, post processing, is the most common.In this method, data is calculated before the visualization process begins. Tracking is slightly more inter-active, in that the scientist does some visualization as the calculations proceed, allowing the monitor toterminate the process when the visualization goals have been reached. Steering is fully interactive graph-ical response and control of both the concurrent calculation and the visualization of that data.These three categories show a progression from non-interactive to highly interactive with respect to the

  • Book Chapter
  • Cite Count Icon 26
  • 10.1007/978-1-84882-909-1_8
Value Visualization Strategies for PSS Development
  • Jan 1, 2009
  • Christian Kowalkowski + 1 more

The concept of value visualization is concerned with the way that firms communicate and demonstrate the value of their Product-Service Systems (PSS), both internally and externally. In this chapter, a visualization strategy framework for PSS development is proposed. It is particularly tailored for industrial companies that are strategically shifting from selling products to becoming providers offering PSS. Value visualization strategies have traditionally focused on external sales activities. However, companies need to have a broader approach to visualization in all PSS development phases, as well as including different visualization techniques. Furthermore, different visualization strategies are needed for each particular development stage of the PSS. Companies need be able to make use of several different visualization strategies, depending on the actual content of the Product-Service System and its position in the development process. Whereas the product development process is rather heavy at the back, successful PSS development projects with high levels of service need to be heavy at the front (that is, they need to not only develop the system but also ensure its rollout). Examples are provided from eight market-leading companies in different industries, each of which are undertaking a strategic shift from identifying themselves as product sellers toward becoming providers offering PSS. To conclude, value visualization has become vital for winning new contracts and retaining existing ones. It is therefore a strategic resource that managers need to pay attention to, and continuously develop, in order to compete with PSS offerings.

  • Research Article
  • 10.58421/gehu.v4i4.786
Fostering Students’ Reading Comprehension through Visualization Strategy
  • Nov 6, 2025
  • Journal of General Education and Humanities
  • Eviyanti Eviyanti + 3 more

Comprehending explicitly stated information in reading passages remains a persistent challenge for many junior high school students, often hindering their ability to understand texts at the literal level, which forms the foundation for higher-order comprehension skills. Therefore, this study aims to determine whether the visualization strategy significantly improves literal reading comprehension among EFL students. The research was conducted at SMP Negeri 20 Palu using a quasi-experimental design. The population consisted of 105 eighth-grade students from classes VIII A, VIII B, VIII C, and VIII D, with class VIII C assigned as the experimental group and class VIII B as the control group through random sampling. Pre-tests and post-tests were administered to collect data. The findings revealed that the experimental group achieved a higher mean score (78) than the control group (63) in the post-test, and the statistical analysis showed a significance value of p &lt; 0.05, indicating that the visualization strategy effectively enhanced students’ reading comprehension, particularly in understanding literal information in descriptive texts. These findings suggest that visualization strategies can be integrated into EFL instruction to improve students’ engagement, comprehension, and overall learning experience.

  • Book Chapter
  • Cite Count Icon 5
  • 10.1007/978-3-642-10643-9_6
Comparing Effects of Different Cinematic Visualization Strategies on Viewer Comprehension
  • Jan 1, 2009
  • Arnav Jhala + 1 more

Computational storytelling systems have mainly focused on the construction and evaluation of textual discourse for communicating stories. Few intelligent camera systems have been built in 3D environments for effective visual communication of stories. The evaluation of effectiveness of these systems, if any, has focused mainly on the run-time performance of the camera placement algorithms. The purpose of this paper is to present a systematic cognitive-based evaluation methodology to compare effects of different cinematic visualization strategies on viewer comprehension of stories. In particular, an evaluation of automatically generated visualizations from Darshak, a cinematic planning system, against different hand-generated visualization strategies is presented. The methodology used in the empirical evaluation is based on QUEST, a cognitive framework for question-answering in the context of stories, that provides validated predictors for measuring story coherence in readers. Data collected from viewers, who watch the same story renedered with three different visualization strategies, is compared with QUEST’s predictor metrics. Initial data analysis establishes significant effect on choice of visualization strategy on story comprehension. It further shows a significant effect of visualization strategy selected by Darshak on viewers’ measured story coherence.

  • Abstract
  • Cite Count Icon 7
  • 10.1016/s0016-5085(14)62474-4
Mo1884 Inter-Rater and Inter-Device Agreement for the Diagnosis of Primary Esophageal Motility Disorders Based on Chicago Classification Between Solid-State and Water-Perfused HRM System -A Prospective, Randomized, Double Blind, Crossover Study
  • May 1, 2014
  • Gastroenterology
  • Giovanni Capovilla + 5 more

Mo1884 Inter-Rater and Inter-Device Agreement for the Diagnosis of Primary Esophageal Motility Disorders Based on Chicago Classification Between Solid-State and Water-Perfused HRM System -A Prospective, Randomized, Double Blind, Crossover Study

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.hpb.2019.10.2212
How high is a high-volume pancreatic surgery centre?
  • Jan 1, 2019
  • HPB
  • Frederick Huynh + 3 more

How high is a high-volume pancreatic surgery centre?

  • Research Article
  • 10.1093/ced/llag043
The reliability and smallest detectable difference of the SALT score in alopecia areata.
  • Jan 28, 2026
  • Clinical and experimental dermatology
  • Elise Van Caelenberg + 10 more

The Severity of Alopecia Tool (SALT) score is widely used. However, its inter- and intrarater reliability and smallest detectable difference (SDD) have not been rigorously evaluated. To determine the interrater and intrarater reliability of the SALT score and to quantify its SDD across different levels of disease severity. We assessed the diagnostic accuracy of the SALT score for identifying key clinical thresholds (e.g., SALT50). Ten patients were scored live by three experienced raters, and 53 photographic cases were scored by 11 observers. After a two-week interval, 9 raters performed test-retest scoring. Interrater and intrarater reliability were excellent, with ICCs of 0.949 and 0.939, respectively. SDD was lowest in minimal disease (6.42%) and highest in extensive disease (up to 29.84%). SALT50 was identified with 97% accuracy, although accuracy dropped to 72% when only patients between SALT40-60 were considered. Intrarater reliability was consistently higher than interrater reliability, particularly in patients with a high disease burden. The SALT score demonstrates high inter- and intrarater reliability, particularly at the extremes of disease severity. However, moderate variability in mid-range scores affects classification near clinical trial thresholds. Rater training and integration of standardized scoring aids may improve its performance.

  • Research Article
  • Cite Count Icon 47
  • 10.21037/atm.2019.01.65
Intra- and interobserver reliability of the Spinal Instability Neoplastic Score system for instability in spine metastases: a systematic review and meta-analysis.
  • May 1, 2019
  • Annals of Translational Medicine
  • Zach Pennington + 5 more

Mechanical instability is one of the two main indications for surgical intervention in patients with metastatic spine disease. Since its publication in 2010, the Spinal Instability Neoplastic Score (SINS) has been the most commonly used means of assessing mechanical instability. To prove clinically valuable though, diagnostic tests must demonstrate consistency across measures and across observers. Here we report a systematic review and meta-analysis of all prior reports of intraobserver and interobserver reliability of the SINS score. To identify articles, we queried the PubMed, CINAHL, EMBASE, Cochrane, and Web of Science databases for all full-text English articles reporting interobserver or intraobserver reliability for the SINS score, category, or a domain of the SINS score. Articles reporting confidence intervals for these metrics were then subjected to meta-analysis to identify pooled estimates of reliability. Of 167 unique studies identified, seven met inclusion criteria and were subjected to qualitative review and meta-analysis. Intraobserver reliability for SINS score was found to be near perfect [estimate =0.815; 90% CI (0.661-0.969)] and interobserver reliability was substantial [0.673; (0.227-1.12)]. Intraobserver and interobserver reliability among spine surgeons was significantly better than reliability across all observers (both P<0.0001). Qualitative analysis suggested that increased surgeon experience may be associated with greater intraobserver and interobserver reliability among spine surgeons. On the whole, meta-analysis of the available literature suggests SINS to have good intraobserver and interobserver reliability, giving it the potential to be a valuable guide to the management of patients with spinal metastases. Further research is required to demonstrate that SINS score correlates with the clinical decision to stabilize.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 81
  • 10.3310/hta23610
Imaging tests for the detection of osteomyelitis: a systematic review.
  • Oct 1, 2019
  • Health Technology Assessment
  • Alexis Llewellyn + 5 more

Osteomyelitis is an infection of the bone. Medical imaging tests, such as radiography, ultrasound, magnetic resonance imaging (MRI), single-photon emission computed tomography (SPECT) and positron emission tomography (PET), are often used to diagnose osteomyelitis. To systematically review the evidence on the diagnostic accuracy, inter-rater reliability and implementation of imaging tests to diagnose osteomyelitis. We conducted a systematic review of imaging tests to diagnose osteomyelitis. We searched MEDLINE and other databases from inception to July 2018. Risk of bias was assessed with QUADAS-2 [quality assessment of diagnostic accuracy studies (version 2)]. Diagnostic accuracy was assessed using bivariate regression models. Imaging tests were compared. Subgroup analyses were performed based on the location and nature of the suspected osteomyelitis. Studies of children, inter-rater reliability and implementation outcomes were synthesised narratively. Eighty-one studies were included (diagnostic accuracy: 77 studies; inter-rater reliability: 11 studies; implementation: one study; some studies were included in two reviews). One-quarter of diagnostic accuracy studies were rated as being at a high risk of bias. In adults, MRI had high diagnostic accuracy [95.6% sensitivity, 95% confidence interval (CI) 92.4% to 97.5%; 80.7% specificity, 95% CI 70.8% to 87.8%]. PET also had high accuracy (85.1% sensitivity, 95% CI 71.5% to 92.9%; 92.8% specificity, 95% CI 83.0% to 97.1%), as did SPECT (95.1% sensitivity, 95% CI 87.8% to 98.1%; 82.0% specificity, 95% CI 61.5% to 92.8%). There was similar diagnostic performance with MRI, PET and SPECT. Scintigraphy (83.6% sensitivity, 95% CI 71.8% to 91.1%; 70.6% specificity, 57.7% to 80.8%), computed tomography (69.7% sensitivity, 95% CI 40.1% to 88.7%; 90.2% specificity, 95% CI 57.6% to 98.4%) and radiography (70.4% sensitivity, 95% CI 61.6% to 77.8%; 81.5% specificity, 95% CI 69.6% to 89.5%) all had generally inferior diagnostic accuracy. Technetium-99m hexamethylpropyleneamine oxime white blood cell scintigraphy (87.3% sensitivity, 95% CI 75.1% to 94.0%; 94.7% specificity, 95% CI 84.9% to 98.3%) had higher diagnostic accuracy, similar to that of PET or MRI. There was no evidence that diagnostic accuracy varied by scan location or cause of osteomyelitis, although data on many scan locations were limited. Diagnostic accuracy in diabetic foot patients was similar to the overall results. Only three studies in children were identified; results were too limited to draw any conclusions. Eleven studies evaluated inter-rater reliability. MRI had acceptable inter-rater reliability. We found only one study on test implementation and no evidence on patient preferences or cost-effectiveness of imaging tests for osteomyelitis. Most studies included < 50 participants and were poorly reported. There was limited evidence for children, ultrasonography and on clinical factors other than diagnostic accuracy. Osteomyelitis is reliably diagnosed by MRI, PET and SPECT. No clear reason to prefer one test over the other in terms of diagnostic accuracy was identified. The wider availability of MRI machines, and the fact that MRI does not expose patients to harmful ionising radiation, may mean that MRI is preferable in most cases. Diagnostic accuracy does not appear to vary with the potential cause of osteomyelitis or with the body part scanned. Considerable uncertainty remains over the diagnostic accuracy of imaging tests in children. Studies of diagnostic accuracy in children, particularly using MRI and ultrasound, are needed. This study is registered as PROSPERO CRD42017068511. This project was funded by the National Institute for Health Research Health Technology Assessment programme and will be published in full in Health Technology Assessment; Vol. 23, No. 61. See the NIHR Journals Library website for further project information.

  • Research Article
  • Cite Count Icon 59
  • 10.1177/1094670504268449
An Investigation of Visualization and Documentation Strategies in Services Advertising
  • Nov 1, 2004
  • Journal of Service Research
  • Donna J Hill + 3 more

This study examines two advertising strategies— documentation and visualization—on several measures of ad effectiveness across two different types of service offerings (utilitarian and hedonic). For both service types, a visualization strategy was found to have a positive effect on informativeness, perceived quality, and likelihood of use (but had no effect on uniqueness). As expected, the documentation strategy had a positive effect on all of the dependent measures within the hedonic service environment but had no effect in the utilitarian setting. This is consistent with dual processing theories and suggests that individuals are more attentive to various forms of documentary information (e.g., statistics, figures, charts, testimonials, comparisons) when the service offering provides hedonic value.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 26
  • 10.31557/apjcp.2019.20.4.1283
Interrater Reliability of Various Thyroid Imaging Reporting and Data System (TIRADS) Classifications for Differentiating Benign from Malignant Thyroid Nodules
  • Jan 1, 2019
  • Asian Pacific Journal of Cancer Prevention : APJCP
  • Warinthorn Phuttharak + 3 more

Background:Thyroid ultrasound(US) is used as the first diagnostic tool to assess the management of disease but is operator dependent. There have been few reports evaluating interrater variability in US assessment. Therefore, we evaluated interrater reliability in US assessment of thyroid nodules and estimated its diagnostic accuracy for various TIRADS systems. Methods:This retrospective study included 24 malignant nodules and 84 benign nodules from January 2015 to October 2017. Two blinded observers independently reviewed stored US images by using TIRADS. All analyses followed guidelines proposed by ACR-TR, Siriraj-TR and EU-TR systems. Interrater reliability was calculated using Cohen’s Kappa statistics. Diagnostic accuracy were also calculated.Results:Interobserver agreement showed substantial agreement for composition (K=0.616); echogenicity and echogenic foci showed fair agreement (K=0.327 and 0.288, respectively); margin showed slight agreement (K=0.143). Interrater reliability for the final assessment; moderate agreement for ACR-TIRADS system (K=0.500); fair agreement for EU-TIRADS system (K=0.209) and slight agreement (K=0.114) for Siriraj-TIRADS system. The diagnostic performance from the two observers; ACR-TIRADS system; sensitivities were 75% and 79.2%, specificities were 58.3% and 56%, positive predictive value (PPV) were 34% and 33.9% and negative predictive value (NPV) were 89.1% and 90.4%. For the Siriraj-TIRADS system, sensitivities were 41.7% and 25%, specificities were 84.5% and 89.3%, positive predictive value (PPV) were 43.5% and 40% and negative predictive value (NPV) were 83.5% and 80.6%. For the EU-TIRADS system, sensitivities were 45.8% and 66.7%, specificities were 79.8% and 72.6%, positive predictive value (PPV) were 39.3% and 41% and negative predictive value (NPV) were 83.8% and 88.4%. Conclusion:The ACR-TIRADS had highest interobserver agreement, a trend to have highest sensitivity and negative predictive value for diagnosis of malignant thyroid nodules. Siriraj-TIRADS had higher specificity and accuracy, but lower interobserver agreement.

  • Research Article
  • Cite Count Icon 1
  • 10.1186/s41077-024-00329-9
Beyond reliability: assessing rater competence when using a behavioural marker system
  • Dec 31, 2024
  • Advances in Simulation
  • Samantha Eve Smith + 4 more

BackgroundBehavioural marker systems are used across several healthcare disciplines to assess behavioural (non-technical) skills, but rater training is variable, and inter-rater reliability is generally poor. Inter-rater reliability provides data about the tool, but not the competence of individual raters. This study aimed to test the inter-rater reliability of a new behavioural marker system (PhaBS — pharmacists’ behavioural skills) with clinically experienced faculty raters and near-peer raters. It also aimed to assess rater competence when using PhaBS after brief familiarisation, by assessing completeness, agreement with an expert rater, ability to rank performance, stringency or leniency, and avoidance of the halo effect.MethodsClinically experienced faculty raters and near-peer raters attended a 30-min PhaBS familiarisation session. This was immediately followed by a marking session in which they rated a trainee pharmacist’s behavioural skills in three scripted immersive acute care simulated scenarios, demonstrating good, mediocre, and poor performances respectively. Inter-rater reliability in each group was calculated using the two-way random, absolute agreement single-measures intra-class correlation co-efficient (ICC). Differences in individual rater competence in each domain were compared using Pearson’s chi-squared test.ResultsThe ICC for experienced faculty raters was good at 0.60 (0.48–0.72) and for near-peer raters was poor at 0.38 (0.27–0.54). Of experienced faculty raters, 5/9 were competent in all domains versus 2/13 near-peer raters (difference not statistically significant). There was no statistically significant difference between the abilities of clinically experienced versus near-peer raters in agreement with an expert rater, ability to rank performance, stringency or leniency, or avoidance of the halo effect. The only statistically significant difference between groups was ability to compete the assessment (9/9 experienced faculty raters versus 6/13 near-peer raters, p = 0.0077).ConclusionsExperienced faculty have acceptable inter-rater reliability when using PhaBS, consistent with other behaviour marker systems; however, not all raters are competent. Competence measures for other assessments can be helpfully applied to behavioural marker systems. When using behavioural marker systems for assessment, educators must start using such rater competence frameworks. This is important to ensure fair and accurate assessments for learners, to provide educators with information about rater training programmes, and to provide individual raters with meaningful feedback.

  • Research Article
  • 10.1371/journal.pone.0311805
An exploratory study of the impact of CT slice thickness and inter-rater variability on anatomical accuracy of malunited distal radius models and surgical guides for corrective osteotomy.
  • Oct 10, 2024
  • PloS one
  • Emilia Gryska + 5 more

High-resolution CT images are essential in clinical practice to accurately replicate patient anatomy for 3D virtual surgical planning and designing patient-specific surgical guides. These technologies are commonly used in corrective osteotomy of the distal radius. This study evaluated how the virtual radius models and the surgical guides' surface that is in contact with the bone vary between experienced raters. Further, the discrepancies from the reference radius of surgical guides and radius models created from CT images with slice thicknesses larger than the reference standard of 0.625mm were assessed. Maximum overlap with radius model was measured for guides, and absolute average distance error was measured for radius models. The agreement between the lower-resolution guides surface and the raters' guide surface was evaluated. The average inter-rater guide surface overlap was -0.11mm [95% CI: -0.13-0.09]. The surface of surgical guides designed on CT images with a 1mm slice thickness deviated from the reference radius within the inter-rater range (0.03mm). For slice thicknesses of 1.25mm and 1.5mm, the average guide surface overlap was 0.12mm and 0.15mm, respectively. The average inter-rater radius surface variability was 0.03mm [95% CI: 0.025-0.035]. The discrepancy from the reference of all radius models created from CT images with a slice thickness larger than the reference slice thickness was notably larger than the inter-rater variability but, excluding one case, did not exceed 0.2mm. The results suggest that 1mm CT images are suitable for surgical guide design. While 1.25mm slices are commonly used for virtual planning in hand and forearm surgery, slices larger than 1mm may approach the limit of clinical acceptability. Discrepancies in radius models were below 1mm, likely below clinical relevance.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant