Qualitative versus quantitative lumbar spinal stenosis grading by machine learning supported texture analysis—Experience from the LSOS study cohort

Florian A Huber,Shanon Stutz,Ilaria Vittoria De Martini,Manoj Mannil,Anton S Becker,Sebastian Winklhofer,Jakob M Burgstaller,Roman Guggenberger

doi:10.1016/j.ejrad.2019.02.023

Abstract

Purpose: To investigate and compare the reproducibility and accuracy of qualitative ratings and quantitative texture analysis (TA) in detection and grading of lumbar spinal stenosis (LSS) in magnetic resonance imaging (MR) scans of the lumbar spine.Materials and methods: From a nationwide multicenter and multidisciplinary lumbar stenosis outcome study (LSOS) register 82 patients, undergoing MR scans of the lumbar spine due to clinical indication of spinal claudication, with a single level central or lateral severe LSS were included. In total 343 transaxial T2-weighted images of the lumbar spine were included from one to five levels (L1 to S1) per patient. One expert radiologist serving as reference standard rated LSS grade according to a standard four-point (normal to severe) as well as to an eight-point Schizas grading scale. DICOM data were then rescaled to a defined pixel size. Two independent readers performed qualitative ratings analogous to expert reader in addition to TA of spinal canals by manually placing two regions of interest (ROI) per image reflecting qualitative scales: (1) dural sac only (2) inner contour of the spinal canal including epidural fat and bilateral recesses. Interreader agreements of qualitative and quantitative parameters were assessed by Cohen’s Kappa (κ) and intraclass correlation (ICC), respectively. TA feature reduction was performed by ICC threshold > 0.75. Remaining features were analyzed with machine learning algorithms (Weka 3 tool) for correlation with LSS grades using 10-fold cross validation.Results: Qualitative ratings showed only moderate reproducibility for both LSS classification systems but high correlation with cut-off cross-sectional area (CSA) <130mm² for severe spinal stenosis. In quantitative TA of both ROIs, machine learning analysis with a decision tree classifier revealed higher performances for LSS grading compared to qualitative assessments using the reference CSA cut-off, respectively.Conclusion: Qualitative LSS grading independent of classification system shows moderate reproducibility. TA with machine learning offers highly reproducible quantitative parameters that increase accuracy for severe LSS detection with minor impact of grading score and CSA border definition.

Full Text