A deep learning-based interactive medical image segmentation framework with sequential memory

Ivan Mikhailov,Benoit Chauveau,Nicolas Bourdel,Adrien Bartoli

doi:10.1016/j.cmpb.2024.108038

Abstract

Background and objective. Image segmentation is an essential component in medical image analysis. The case of 3D images such as MRI is particularly challenging and time consuming. Interactive or semi-automatic methods are thus highly desirable. However, existing methods do not exploit the typical sequentiality of real user interactions. This is due to the interaction memory used in these systems, which discards ordering. In contrast, we argue that the order of the user corrections should be used for training and lead to performance improvements.Methods. We contribute to solving this problem by proposing a general multi-class deep learning-based interactive framework for image segmentation, which embeds a base network in a user interaction loop with a user feedback memory. We propose to model the memory explicitly as a sequence of consecutive system states, from which the features can be learned, generally learning from the segmentation refinement process. Training is a major difficulty owing to the network's input being dependent on the previous output. We adapt the network to this loop by introducing a virtual user in the training process, modelled by dynamically simulating the iterative user feedback.Results. We evaluated our framework against existing methods on the complex task of multi-class semantic instance female pelvis MRI segmentation with 5 classes, including up to 27 tumour instances, using a segmentation dataset collected in our hospital, and on liver and pancreas CT segmentation, using public datasets. We conducted a user evaluation, involving both senior and junior medical personnel in matching and adjacent areas of expertise. We observed an annotation time reduction with 5'56” for our framework against 25' on average for classical tools. We systematically evaluated the influence of the number of clicks on the segmentation accuracy. A single interaction round our framework outperforms existing automatic systems with a comparable setup. We provide an ablation study and show that our framework outperforms existing interactive systems.Conclusions. Our framework largely outperforms existing systems in accuracy, with the largest impact on the smallest, most difficult classes, and drastically reduces the average user segmentation time with fast inference at 47.2±6.2 ms per image.

Full Text