Abstract
Abstract Background Scoring colonoscopy videos of UC patients requires a high level of expertise, but even among trained expert readers there are disagreements. In clinical trials, this can negatively impact both subject selection and assessment of treatment response. Consistency among central readers might be improved by algorithms that automatically process videos to identify salient features and estimate the level of disease activity using established scoring methods. Methods We propose an end-to-end system using machine learning (ML) models to process colonoscopy videos, both shortening the time required for a human expert reader to score a video by filtering out non-informative parts and supplying a second-opinion of Mayo Endoscopic Score (MES) on each informative frame and the full video. Our dataset included videos from UC patients with a representative range of disease activity acquired at a Ukrainian hospital (n=505) and an Israeli hospital (n=227). Video annotation was performed by 12 reviewers, trained and supervised by an expert gastroenterologist (DM) with >10 years central reading experience. Out of almost 3 million frames over ~34% were classified by human reviewers as non-informative (e.g., out of focus, motion blurring, stool). The remaining informative frames were further annotated by the reviewers as containing ulcers (6%) and hence classified as MES=3, erosions (22%) as MES=2, loss of vascularity or erythema (23%) as MES= 1, or none of the above (49%) classified as MES=0. Informative frames were split to 80% training set, 10% validation set and 10% test set. The data set included low quality frames on which the reviewers could still estimate the MES. Models consisting of image processing algorithms, Convolutional Neural Networks (CNN), CNN+Long Short Term Memory (LSTM) model and classical ML classifiers were trained to filter out non-informative frames and score independently each frame and the full video for MES. Results Model performance was evaluated using Cohen’s Quadratic Weighted Kappa (QWK) with 95% Confidence Interval [95% CI]..Model agreement with humans is shown in the middle column. Inter-reviewer agreement based on a subset of ‘clear frames’ from various videos of the test set is shown in the last column. Conclusion We developed a full end-to-end pipeline for processing colonoscopy videos that estimated MES with accuracy comparable to human-human agreement. Such a model has the potential to make human readers more efficient by highlighting frames with relevant pathology. It can also aid inter-reader agreement by providing a second-opinion MES. Future work will expand model training and testing with new data sources and explore paradigms to combine the model with human readers to improve clinical trial central reading.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.