Abstract

Artificial intelligence (AI) is increasingly being used in bone-age (BA) assessment due to its complicated and lengthy nature. We aimed to evaluate the clinical performance of a commercially available deep learning (DL)–based software for BA assessment using a real-world data. From Nov. 2018 to Feb. 2019, 474 children (35 boys, 439 girls, age 4–17 years) were enrolled. We compared the BA estimated by DL software (DL-BA) with that independently estimated by 3 reviewers (R1: Musculoskeletal radiologist, R2: Radiology resident, R3: Pediatric endocrinologist) using the traditional Greulich–Pyle atlas, then to his/her chronological age (CA). A paired t-test, Pearson’s correlation coefficient, Bland–Altman plot, mean absolute error (MAE) and root mean square error (RMSE) were used for the statistical analysis. The intraclass correlation coefficient (ICC) was used for inter-rater variation. There were significant differences between DL-BA and each reviewer’s BA (P < 0.025), but the correlation was good with one another (r = 0.983, P < 0.025). RMSE (MAE) values were 10.09 (7.21), 10.76 (7.88) and 13.06 (10.06) months between DL-BA and R1, R2, R3 BA. Compared with the CA, RMSE (MAE) values were 13.54 (11.06), 15.18 (12.11), 16.19 (12.78) and 19.53 (17.71) months for DL-BA, R1, R2, R3 BA, respectively. Bland–Altman plots revealed the software and reviewers’ tendency to overestimate the BA in general. ICC values between 3 reviewers were 0.97, 0.85 and 0.86, and the overall ICC value was 0.93. The BA estimated by DL-based software showed statistically similar, or even better performance than that of reviewers’ compared to the chronological age in the real world clinic.

Highlights

  • Artificial intelligence (AI) is increasingly being used in bone-age (BA) assessment due to its complicated and lengthy nature

  • For a deep learning based automatic software system to be used in clinical settings, a carefully designed external validation study is needed with datasets consisted of newly recruited patients or those from other institutions that exhibit similar characteristics to patients in a real-world s­ etting[11]

  • In the analysis with the deep learning (DL)-BA, the results showed that between Reviewer 1 (R1)-estimated BA (R1-BA) and DL-BA, paired t-test had P value of less than 0.025, which implies significant differences between them

Read more

Summary

Introduction

Artificial intelligence (AI) is increasingly being used in bone-age (BA) assessment due to its complicated and lengthy nature. The BA estimated by DL-based software showed statistically similar, or even better performance than that of reviewers’ compared to the chronological age in the real world clinic. In the TW method, each bone of the left hand and wrist is given a score in comparison with a standard set of bones at different stages of maturation, and the total score is calculated to determine the BA. As both processes are rather time-consuming and the values tend to vary depending on the clinician’s experience, there have been optimization issues regarding their uses in BA assessment. For a deep learning based automatic software system to be used in clinical settings, a carefully designed external validation study is needed with datasets consisted of newly recruited patients or those from other institutions that exhibit similar characteristics to patients in a real-world s­ etting[11]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call