To reduce the mortality related to bladder cancer, efforts need to be concentrated on early detection of the disease for more effective therapeutic intervention. Strong risk factors (eg, smoking status, age, professional exposure) have been identified, and some diagnostic tools (eg, by way of cystoscopy) have been proposed. However, to date, no fully satisfactory (noninvasive, inexpensive, high-performance) solution for widespread deployment has been proposed. Some new models based on cytology image classification were recently developed and bring good perspectives, but there are still avenues to explore to improve their performance. Our team aimed to evaluate the benefit of combining the reuse of massive clinical data to build a risk factor model and a digital cytology image-based model (VisioCyt) for bladder cancer detection. The first step relied on designing a predictive model based on clinical data (ie, risk factors identified in the literature) extracted from the clinical data warehouse of the Rennes Hospital and machine learning algorithms (logistic regression, random forest, and support vector machine). It provides a score corresponding to the risk of developing bladder cancer based on the patient's clinical profile. Second, we investigated 3 strategies (logistic regression, decision tree, and a custom strategy based on score interpretation) to combine the model's score with the score from an image-based model to produce a robust bladder cancer scoring system. We collected 2 data sets. The first set, including clinical data for 5422 patients extracted from the clinical data warehouse, was used to design the risk factor-based model. The second set was used to measure the models' performances and was composed of data for 620 patients from a clinical trial for which cytology images and clinicobiological features were collected. With this second data set, the combination of both models obtained areas under the curve of 0.82 on the training set and 0.83 on the test set, demonstrating the value of combining risk factor-based and image-based models. This combination offers a higher associated risk of cancer than VisioCyt alone for all classes, especially for low-grade bladder cancer. These results demonstrate the value of combining clinical and biological information, especially to improve detection of low-grade bladder cancer. Some improvements will need to be made to the automatic extraction of clinical features to make the risk factor-based model more robust. However, as of now, the results support the assumption that this type of approach will be of benefit to patients.
Read full abstract