189. Spinal stenosis grading in magnetic resonance imaging using deep convolutional neural networks

Suk-Joong Lee,Hyun-Joo Lee

doi:10.1016/j.spinee.2020.05.600

Abstract

BACKGROUND CONTEXT Grading system for spinal stenosis relies on the subjective opinions of observers and readings usually require considerable time since stacked images per each disc level need to be read one by one. To address this time-consuming task, machine learning based methods have been proposed. In order to validate the usefulness of deep learning-based methods for clinical use, we hypothesize that the grade labels predicted by deep learning-based methods would be comparable to that of medical experts. PURPOSE This study aims to verify the feasibility of a computer-assisted spine stenosis grading system by comparing the diagnostic agreement between two experts and the agreement between the experts and trained artificial CNN classifiers. STUDY DESIGN/SETTING Retrospective MRI grading with comparison between experts and deep convolutional neural networks (CNN). PATIENT SAMPLE A total of 542 patients with lower back pain. OUTCOME MEASURES Not applicable. METHODS For 542 L4-5 axial MR images, two experts independently localized the center position of the spine canal and graded the status. Two CNN classifiers each trained with the grading label made by the two experts were validated using 10-fold cross validation. Each classifier consisted of a CNN detection model responsible for the localization of patches near the canal and a classification CNN model to predict the spinal stenosis status in the localized patches. Faster R-CNN was used for the detection model whereas VGG network was used for the classification model. A comparison in grading agreement was carried out between the two experts as well as that of the experts and the prediction results generated by the CNN models. RESULTS Grading agreement between the experts was 77.5% and 75% in terms of accuracy and F1 scores. The agreement between the first expert and the model trained with the labels of the first expert was 83% and 75.4%, respectively. The agreement between the second expert and the model trained with the labels of the second expert was 77.9% and 74.9%. The differences between the two experts were significant, while the differences between each expert and the trained models were not significant. CONCLUSIONS We indeed confirmed that automatic diagnosis using deep learning may be feasible for spinal stenosis grading. FDA DEVICE/DRUG STATUS This abstract does not discuss or include any applicable devices or drugs.

Full Text