Purpose: Osteoarthritis (OA) classification in the knee is most commonly done with radiographs using the 0–4 Kellgren Lawrence (KL) grading system where 0 is normal, 1 shows doubtful signs of OA with potential abnormality, 2 demonstrates definite osteophytes (mild OA), 3 shows definite joint space narrowing (moderate OA), and 4 is severe joint space narrowing with subchondral sclerosis and bony deformity (severe OA). KL grading is widely used for clinical assessment and diagnosis of OA, usually on a high volume of radiographs, making its automation highly relevant. We propose a fully automated algorithm for the detection of OA using KL gradings with a state-of-the-art neural network. Methods: 4,490 bilateral PA fixed flexion knee radiographs were collected from the Osteoarthritis Initiative dataset (age = 61.2 ± 9.2 years, BMI = 32.8 ± 15.9 Kg/m2, 42/58 male/female split). The left and right knee joints were separated using image thresholding to identify the edges of the bone. The knee joints for each side were localized using a weighted-template matching by passing a cropped knee joint radiograph over each image and selecting the region with the highest correlation to the template. This resulted in a total of 8980 unique left and right radiographs which were randomly divided with a 60/10/30% split into training, validation, and testing data. Due to the large imbalance in KL score classes and relatively small dataset size, random rotation and translation augmentations were applied to the training dataset to increase the sample size. Using the state-of-the-art neural network architecture called DenseNet, which utilizes dense connections to reduce the total number of learnable feature maps, the radiograph's KL scores were learned. KL scores of 0 and 1 were combined into one class since clinically they are not considered as definite OA. This also helped accelerate the learning of the neural network. The DenseNet's grading decisions were examined by looking at the sensitivity analysis heatmap created by calculating the gradient of the predicted output with respect to the input. Results: Through a manual quality check, 99% of the automatically localized knee joint images contained the correct region. For the classification of no OA, mild, moderate, and severe OA, testing accuracies of 70.36%, 66.18%, 73.51%, and 77.42% were achieved. The count confusion matrix can be viewed in Figure 1. A handful of the cases misclassified by the model were reviewed by a clinical radiologist to better understand why and if the model was incorrect. For the majority of these cases the radiologist agreed that there were features that represented the grading made by the algorithm (examples can be viewed in Figures 2A and 2B). There was even a case reviewed where the provided radiologist grading was incorrect and the artificial intelligence grading was correct (Figure 2C). A heatmap of the case examined in Figure 2A was created to confirm the intercondylar notch feature identified by the clinical radiologists that potentially led to the model’s misclassification (Figure 3). The bright partial derivative signals around the intercondylar notch confirmed that this feature did play a role in the neural networks decision to classify this case as mild OA. Conclusions: In this study we provide a proof of concept that a fully automated pipeline using artificial intelligence can identify varying stages of OA. This algorithm has the ability to quickly filter radiographs to classify subjects at a high risk of OA as well as identify relevant features that may play a role in the development and presence of OA.View Large Image Figure ViewerDownload Hi-res image Download (PPT)
Read full abstract