Conceptual review of outcome metrics and measures used in clinical evaluation of artificial intelligence in radiology.

Seong Ho Park,Kyunghwa Han,June-Goo Lee

doi:10.1007/s11547-024-01886-9

Seong Ho Park, Kyunghwa Han + Show 1 more

https://doi.org/10.1007/s11547-024-01886-9

Copy DOI

Export

Save

Cite

Journal: La Radiologia medica	Publication Date: Sep 3, 2024
Citations: 1

Abstract
Full-Text
Similar Papers

Abstract

Listen

Artificial intelligence (AI) has numerous applications in radiology. Clinical research studies to evaluate the AI models are also diverse. Consequently, diverse outcome metrics and measures are employed in the clinical evaluation of AI, presenting a challenge for clinical radiologists. This review aims to provide conceptually intuitive explanations of the outcome metrics and measures that are most frequently used in clinical research, specifically tailored for clinicians. While we briefly discuss performance metrics for AI models in binary classification, detection, or segmentation tasks, our primary focus is on less frequently addressed topics in published literature. These include metrics and measures for evaluating multiclass classification; those for evaluating generative AI models, such as models used in image generation or modification and large language models; and outcome measures beyond performance metrics, including patient-centered outcome measures. Our explanations aim to guide clinicians in the appropriate use of these metrics and measures.

Full Text