Generalizability Coefficients Research Articles

OPEN ACCESSOctober 24, 2013Critical Synthesis Package: American Board of Internal Medicine (ABIM) Peer Assessment Matt J. Rustici, MD Matt J. Rustici, MD University of Colorado School of Medicine Google Scholar More articles by this author https://doi.org/10.15766/mep_2374-8265.9590 SectionsAbout ToolsDownload Citations ShareFacebookTwitterEmail AbstractThis Critical Synthesis Package contains: (1) a Critical Analysis of the psychometric properties and the application to health science education of the American Board of Internal Medicine (ABIM) Peer Assessment, and (2) a copy of the ABIM Peer Assessment instrument developed by Eric Holmboe, MD.The ABIM Peer Assessment is an 11-item instrument with responses recorded on a 9-point numerical rating scale with descriptive anchors. Higher scores indicate a more favorable assessment. The instrument was developed as a component of the ABIM re-certification process and is designed to assess practicing internal medicine physicians. Items assess ambulatory care skills, medical knowledge, management of multiple complex problems, problem solving, management of hospitalized patients, respect, integrity, psychosocial aspects of illness, compassion, responsibility and overall clinical skills. The instrument has demonstrated an acceptable generalizability coefficient (0.7) when 10–13 instruments are collected and seems to be minimally biased when peer evaluators are self-selected by the physician being evaluated. The instrument attempts to provide feedback to physicians from peers to help guide self-improvement plans and does not have any established minimum competency scores. In one study, scores of 100 practicing physicians were positively correlated to evaluations from residency directors but there is limited supporting data correlating scores on the instrument to other measures of physician competence. Educational Objectives To describe the purpose and basic properties of the American Board of Internal Medicine (ABIM) Peer Assessment, including number of items and scales, and psychometric properties;To describe the application of the ABIM Peer Assessment to the field of health sciences education;To evaluate the relative strengths and weaknesses of the ABIM Peer Assessment; andTo provide the ABIM Peer Assessment and supplemental materials to aid in its administration. Sign up for the latest publications from MedEdPORTAL Add your email below FILES INCLUDEDReferencesRelatedDetails FILES INCLUDED Included in this publication: ABIM Peer Assessment Instrument.pdf Critical Analysis of the American Board of Internal Medicine (ABIM) Peer Assessment.pdf To view all publication components, extract (i.e., unzip) them from the downloaded .zip file. Download editor’s noteThis publication may contain technology or a display format that is no longer in use. CitationRustici MJ. Critical Synthesis Package: American Board of Internal Medicine (ABIM) Peer Assessment. MedEdPORTAL. 2013;9:9590. https://doi.org/10.15766/mep_2374-8265.9590 Copyright & Permissions© 2013 Rustici. This is an open-access publication distributed under the terms of the Creative Commons Attribution-NonCommercial-Share Alike license.KeywordsEvaluationDREAMPhysiciansDirectory and Repository of Educational Assessment MeasuresPeer AssessmentClinical CompetenceEvaluation Studies as TopicInternal MedicineFeedbackProfessionalismHumanismAmerican Board of Internal MedicineABIMNon-Clinical SkillsClinical Skills Disclosures None to report. Funding/Support None to report. Loading ...

Validity theory has evolved dramatically in the past few decades. The most prominent theory in recent years is an argument-based validity framework, proposed by Kane (1992, 2004, 2006). To evaluate test score interpretations and uses based on Kane’s framework, test developers first need to provide interpretive arguments and then validity arguments by proving sound warrants for the following four inferences: (a) scoring from observation to an observed score, (b) generalization from the observed score to the universe score, (c) extrapolation from the universe score to a target score, and (d) decision from the target score to use. In the field of language testing, a number of studies have been conducted to investigate the validity of test score interpretations and uses, especially for the ones considered to be high-stakes such as the TOEFL (Chapelle, 2008; Chapelle, Enright, & Jamieson, 2010). However, not many studies have been conducted to validate in-house placement test score interpretations and uses, and no study has evaluated the validity of such low-stakes tests using Kane’s validity framework. Regardless of whether the tests are high or low stakes, test developers need to be responsible for validating their test score interpretations and uses in order to attest to the validity. This study uses Kane’s (2006) argument-based validity framework to evaluate the validity of in-house placement test score interpretations and uses. The research questions are as follows: (a) to what extent do examinees get placement items correct and high-scoring examinees get more placement items correct; (b) to what extent are placement items consistently sampled from a domain sufficient in number so as to reduce measurement error; (c) to what extent do the difficulty of placement items match the objectives of a reading course; and (d) to what extent do placement decisions made to place examinees in their proper level of the course have an impact on washback in the course? An in-house placement test made up of 40-item grammar, 40-item vocabulary, and 10-item reading sections was developed and administered to 428 first-year private-university students in April 2010. The item format adopted was all multiple-choice so the answer sheets could be easily scored with a reader. Based on their test scores, about 60 high-scoring students and 50 low-scoring students were placed into one of two advanced or one of two basic reading classes. The remaining students were placed into one of several intermediate classes. A 55-item grammar achievement test was administered twice (once as a pretest and then again as a posttest) to the two basic and two intermediate classes. In addition, a 51-item class evaluation survey was administered to investigate students’ participation in the reading classes and to gauge students’ satisfaction with the classes and study support. Warrants for a validity argument of score inference were based on the results of the item analysis. A warrant for a validity argument of generalization inference was based on the composite generalizability coefficient of .92. A warrant for a validity argument of extrapolation inference was based on FACETS analysis, showing that difficulty estimates of learning levels were in an expected difficulty order. A warrant for validity arguments of decision inference was based on the basic-level students’ score gain on an achievement test and their positive reactions to a class evaluation survey. All the validity arguments presented in this study support the validity of the placement test score interpretations and uses. However, to further improve the validity of the test score interpretations and uses, it is necessary to investigate washback effects of the placement test in the reading classes and to revise the test to make grammar, vocabulary, and reading sections with 30 items each.

Generalizability Coefficients Research Articles

Related Topics

Articles published on Generalizability Coefficients

관찰·추천제에 의한 수학영재 선발 시 사용되는 교사추천서와 자기소개서 평가에 대한 다변량 일반화가능도 이론의 활용

Critical Synthesis Package: American Board of Internal Medicine (ABIM) Peer Assessment

Comparing Diagnostic Performance and the Utility of Clinical Vignette-Based Assessment Under Testing Conditions Designed to Encourage Either Automatic or Analytic Thought

The Bristol Radiology Report Assessment Tool (BRRAT): Developing a workplace-based assessment tool for radiology reporting skills

Reliable assessment of general surgeons' non-technical skills based on video-recordings of patient simulated scenarios

Fiabilidad y optimización del programa PROTODEBA v 1.0 para la observación de la Toma de Decisiones en Balonmano

Using Virtual-Reality Simulation to Assess Performance in Endobronchial Ultrasound

学内開発プレイスメントテスト得点解釈と使用の妥当性の評価について Evaluating Validity for In-House Placement Test Score Interpretations and Uses

Using Peers to Assess Handoffs: A Pilot Study

A generalizability analysis of score consistency for the Balanced Inventory of Desirable Responding.

Reliability of the Thoracolumbar Injury Classification and Severity Score and Comparison With the Denis Classification for Injury to the Thoracic and Lumbar Spine

Reliability of observers' subjective impressions of families: A generalizability theory approach

Reliably Measuring Ambulatory Activity Levels of Children and Adolescents With Cerebral Palsy

A Comparison of Two Standard-Setting Approaches in High-Stakes Clinical Performance Assessment Using Generalizability Theory

Reliable and valid assessment of competence in endoscopic ultrasonography and fine-needle aspiration for mediastinal staging of non-small cell lung cancer

A framework for optimising the cost and performance of concept testing

Program for laparoscopic urological skills assessment: Setting certification standards for residents

A Predictive and Construct Validity Study of a High-Stakes Objective Clinical Examination for Assessing the Clinical Competence of International Medical Graduates

How many treatment sessions and patients are needed to create a stable score of adherence and competence in the treatment of cocaine dependence?

Properties of the Achievement of Therapeutic Objectives Scale (ATOS): A Generalizability Theory study

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Generalizability Coefficients Research Articles

Related Topics

Articles published on Generalizability Coefficients

관찰·추천제에 의한 수학영재 선발 시 사용되는 교사추천서와 자기소개서 평가에 대한 다변량 일반화가능도 이론의 활용

Critical Synthesis Package: American Board of Internal Medicine (ABIM) Peer Assessment

Comparing Diagnostic Performance and the Utility of Clinical Vignette-Based Assessment Under Testing Conditions Designed to Encourage Either Automatic or Analytic Thought

The Bristol Radiology Report Assessment Tool (BRRAT): Developing a workplace-based assessment tool for radiology reporting skills

Reliable assessment of general surgeons' non-technical skills based on video-recordings of patient simulated scenarios

Fiabilidad y optimización del programa PROTODEBA v 1.0 para la observación de la Toma de Decisiones en Balonmano

Using Virtual-Reality Simulation to Assess Performance in Endobronchial Ultrasound

学内開発プレイスメントテスト得点解釈と使用の妥当性の評価について Evaluating Validity for In-House Placement Test Score Interpretations and Uses

Using Peers to Assess Handoffs: A Pilot Study

A generalizability analysis of score consistency for the Balanced Inventory of Desirable Responding.

Reliability of the Thoracolumbar Injury Classification and Severity Score and Comparison With the Denis Classification for Injury to the Thoracic and Lumbar Spine

Reliability of observers' subjective impressions of families: A generalizability theory approach

Reliably Measuring Ambulatory Activity Levels of Children and Adolescents With Cerebral Palsy

A Comparison of Two Standard-Setting Approaches in High-Stakes Clinical Performance Assessment Using Generalizability Theory

Reliable and valid assessment of competence in endoscopic ultrasonography and fine-needle aspiration for mediastinal staging of non-small cell lung cancer

A framework for optimising the cost and performance of concept testing

Program for laparoscopic urological skills assessment: Setting certification standards for residents

A Predictive and Construct Validity Study of a High-Stakes Objective Clinical Examination for Assessing the Clinical Competence of International Medical Graduates

How many treatment sessions and patients are needed to create a stable score of adherence and competence in the treatment of cocaine dependence?

Properties of the Achievement of Therapeutic Objectives Scale (ATOS): A Generalizability Theory study