Improving motion-mask segmentation in thoracic CT with multiplanar U-nets.

Ludmilla Penarrubia,Emmanuel Roux,Nicolas Pinon,David Sarrut,Eduardo Enrique Dávila Serrano,Maciej Orkisz,Jean‐Christophe Richard

doi:10.1002/mp.15347

Abstract

Motion-mask segmentation from thoracic computed tomography (CT) images is the process of extracting the region that encompasses lungs and viscera, where large displacements occur during breathing. It has been shown to help image registration between different respiratory phases. This registration step is, for example, useful for radiotherapy planning or calculating local lung ventilation. Knowing the location of motion discontinuity, that is, sliding motion near the pleura, allows a better control of the registration preventing unrealistic estimates. Nevertheless, existing methods for motion-mask segmentation are not robust enough to be used in clinical routine. This article shows that it is feasible to overcome this lack of robustness by using a lightweight deep-learning approach usable on a standard computer, and this even without data augmentation or advanced modeldesign. A convolutional neural-network architecture with three 2D U-nets for the three main orientations (sagittal, coronal, axial) was proposed. Predictions generated by the three U-nets were combined by majority voting to provide a single 3D segmentation of the motion mask. The networks were trained on a database of nonsmall cell lung cancer 4D CT images of 43 patients. Training and evaluation were done with a K-fold cross-validation strategy. Evaluation was based on a visual grading by two experts according to the appropriateness of the segmented motion mask for the registration task, and on a comparison with motion masks obtained by a baseline method using level sets. A second database (76 CT images of patients with early-stage COVID-19), unseen during training, was used to assess the generalizability of the trained neuralnetwork. The proposed approach outperformed the baseline method in terms of quality and robustness: the success rate increased from to without producing any failure. It also achieved a speed-up factor of 60 with GPU, or 17 with CPU. The memory footprint was low: less than 5 GB GPU RAM for training and less than 1 GB GPU RAM for inference. When evaluated on a dataset with images differing by several characteristics (CT device, pathology, and field of view), the proposed method improved the success rate from to . With 5-s processing time on a mid-range GPU and success rates around , the proposed approach seems fast and robust enough to be routinely used in clinical practice. The success rate can be further improved by incorporating more diversity in training data via data augmentation and additional annotated images from different scanners and diseases. The code and trained model are publiclyavailable.

Full Text