To date, AI-supported programs for bone age (BA) determination for medical use in Europe have almost only been validated separately, according to Greulich and Pyle (G&P). Therefore, the current study aimed to compare the performance of three programs, namely BoneXpert, PANDA, and BoneView, on a single Central European population. For this retrospective study, hand radiographs of 306 children aged 1-18 years, stratified by gender and age, were included. A subgroup consisting of the age group accounting for 90% of examinations in clinical practice was formed. The G&P BA was estimated by three human experts-as ground truth-and three AI-supported programs. The mean absolute deviation, the root mean squared error (RMSE), and dropouts by the AI were calculated. The correlation between all programs and the ground truth was prominent (R2 ≥ 0.98). In the total group, BoneXpert had a lower RMSE than BoneView and PANDA (0.62 vs. 0.65 and 0.75 years) with a dropout rate of 2.3%, 20.3% and 0%, respectively. In the subgroup, there was less difference in RMSE (0.66 vs. 0.68 and 0.65 years, max. 4% dropouts). The standard deviation between the AI readers was lower than that between the human readers (0.54 vs. 0.62 years, p < 0.01). All three AI programs predict BA after G&P in the main age range with similar high reliability. Differences arise at the boundaries of childhood. Question There is a lack of comparative, independent validation for artificial intelligence-based bone age estimation in children. Findings Three commercially available programs estimate bone age after Greulich and Pyle with similarly high reliability in a central European cohort. Clinical relevance The comparative study will help the reader choose a software for bone age estimation approved for the European market depending on the targeted age group and economic considerations.
Read full abstract