Classification of glomerular pathological findings using deep learning and nephrologist–AI collective intelligence approach

Eiichiro Uchino,Kanata Suzuki,Noriaki Sato,Ryosuke Kojima,Yoshinori Tamada,Shusuke Hiragi,Hideki Yokoi,Nobuhiro Yugami,Sachiko Minamiguchi,Hironori Haga,Motoko Yanagita,Yasushi Okuno

doi:10.1016/j.ijmedinf.2020.104231

Eiichiro Uchino, Kanata Suzuki + Show 10 more

Open Access

https://doi.org/10.1016/j.ijmedinf.2020.104231

Copy DOI

Abstract

BackgroundAutomated classification of glomerular pathological findings is potentially beneficial in establishing an efficient and objective diagnosis in renal pathology. While previous studies have verified the artificial intelligence (AI) models for the classification of global sclerosis and glomerular cell proliferation, there are several other glomerular pathological findings required for diagnosis, and the comprehensive models for the classification of these major findings have not yet been reported. Whether the cooperation between these AI models and clinicians improves diagnostic performance also remains unknown. Here, we developed AI models to classify glomerular images for major findings required for pathological diagnosis and investigated whether those models could improve the diagnostic performance of nephrologists. MethodsWe used a dataset of 283 kidney biopsy cases comprising 15,888 glomerular images that were annotated by a total of 25 nephrologists. AI models to classify seven pathological findings: global sclerosis, segmental sclerosis, endocapillary proliferation, mesangial matrix accumulation, mesangial cell proliferation, crescent, and basement membrane structural changes, were constructed using deep learning by fine-tuning of InceptionV3 convolutional neural network. Subsequently, we compared the agreement to truth labels between majority decision among nephrologists with or without the AI model as a voter. ResultsOur model for global sclerosis showed high performance (area under the curve: periodic acid-Schiff, 0.986; periodic acid methenamine silver, 0.983); the models for the other findings also showed performance close to those of nephrologists. By adding the AI model output to majority decision among nephrologists, out of the 14 constructed models, the results of the majority decision showed improvement in sensitivity for 10 models (four of them were statistically significant) and specificity for eight models (five significant). ConclusionOur study showed a proof-of-concept for the classification of multiple glomerular findings in a comprehensive method of deep learning and suggested its potential effectiveness in improving diagnostic accuracy of clinicians.

Full Text