Sort by
PolypNextLSTM: a lightweight and fast polyp video segmentation network using ConvNext and ConvLSTM.

Commonly employed in polyp segmentation, single-image UNet architectures lack the temporal insight clinicians gain from video data in diagnosing polyps. To mirror clinical practices more faithfully, our proposed solution, PolypNextLSTM, leverages video-based deep learning, harnessing temporal information for superior segmentation performance with least parameter overhead, making it possibly suitable for edge devices. PolypNextLSTM employs a UNet-like structure with ConvNext-Tiny as its backbone, strategically omitting the last two layers to reduce parameter overhead. Our temporal fusion module, a Convolutional Long Short Term Memory (ConvLSTM), effectively exploits temporal features. Our primary novelty lies in PolypNextLSTM, which stands out as the leanest in parameters and the fastest model, surpassing the performance of five state-of-the-art image and video-based deep learning models. The evaluation of the SUN-SEG dataset spans easy-to-detect and hard-to-detect polyp scenarios, along with videos containing challenging artefacts like fast motion and occlusion. Comparison against 5 image-based and 5 video-based models demonstrates PolypNextLSTM's superiority, achieving a Dice score of 0.7898 on the hard-to-detect polyp test set, surpassing image-based PraNet (0.7519) and video-based PNS+ (0.7486). Notably, our model excels in videos featuring complex artefacts such as ghosting and occlusion. PolypNextLSTM, integrating pruned ConvNext-Tiny with ConvLSTM for temporal fusion, not only exhibits superior segmentation performance but also maintains the highest frames per speed among evaluated models. Code can be found here: https://github.com/mtec-tuhh/PolypNextLSTM .

Just Published
Relevant
A bronchoscopic navigation method based on neural radiation fields.

We introduce a novel approach for bronchoscopic navigation that leverages neural radiance fields (NeRF) to passively locate the endoscope solely from bronchoscopic images. This approach aims to overcome the limitations and challenges of current bronchoscopic navigation tools that rely on external infrastructures or require active adjustment of the bronchoscope. To address the challenges, we leverage NeRF for bronchoscopic navigation, enabling passive endoscope localization from bronchoscopic images. We develop a two-stage pipeline: offline training using preoperative data and online passive pose estimation during surgery. To enhance performance, we employ Anderson acceleration and incorporate semantic appearance transfer to deal with the sim-to-real gap between training and inference stages. We assessed the viability of our approach by conducting tests on virtual bronchscopic images and a physical phantom against the SLAM-based methods. The average rotation error in our virtual dataset is about 3.18 and the translation error is around 4.95 mm. On the physical phantom test, the average rotation and translation error are approximately 5.14 and 13.12 mm. Our NeRF-based bronchoscopic navigation method eliminates reliance on external infrastructures and active adjustments, offering promising advancements in bronchoscopic navigation. Experimental validation on simulation and real-world phantom models demonstrates its efficacy in addressing challenges like low texture and challenging lighting conditions.

Just Published
Relevant
Modality redundancy for MRI-based glioblastoma segmentation.

Automated glioblastoma segmentation from magnetic resonance imaging is generally performed on a four-modality input, including T1, contrast T1, T2 and FLAIR. We hypothesize that information redundancy is present within these image combinations, which can possibly reduce a model's performance. Moreover, for clinical applications, the risk of encountering missing data rises as the number of required input modalities increases. Therefore, this study aimed to explore the relevance and influence of the different modalities used for MRI-based glioblastoma segmentation. After the training of multiple segmentation models based on nnU-Net and SwinUNETR architectures, differing only in their amount and combinations of input modalities, each model was evaluated with regard to segmentation accuracy and epistemic uncertainty. Results show that T1CE-based segmentation (for enhanced tumor and tumor core) and T1CE-FLAIR-based segmentation (for whole tumor and overall segmentation) can reach segmentation accuracies comparable to the full-input version. Notably, the highest segmentation accuracy for nnU-Net was found for a three-input configuration of T1CE-FLAIR-T1, suggesting the confounding effect of redundant input modalities. The SwinUNETR architecture appears to suffer less from this, where said three-input and the full-input model yielded statistically equal results. The T1CE-FLAIR-based model can therefore be considered as a minimal-input alternative to the full-input configuration. Addition of modalities beyond this does not statistically improve and can even deteriorate accuracy, but does lower the segmentation uncertainty.

Just Published
Relevant
Adaptive neighborhood triplet loss: enhanced segmentation of dermoscopy datasets by mining pixel information.

The integration of deep learning in image segmentation technology markedly improves the automation capabilities of medical diagnostic systems, reducing the dependence on the clinical expertise of medical professionals. However, the accuracy of image segmentation is still impacted by various interference factors encountered during image acquisition. To address this challenge, this paper proposes a loss function designed to mine specific pixel information which dynamically changes during training process. Based on the triplet concept, this dynamic change is leveraged to drive the predicted boundaries of images closer to the real boundaries. Extensive experiments on the PH2 and ISIC2017 dermoscopy datasets validate that our proposed loss function overcomes the limitations of traditional triplet loss methods in image segmentation applications. This loss function not only enhances Jaccard indices of neural networks by 2.42 and 2.21 for PH2 and ISIC2017, respectively, but also neural networks utilizing this loss function generally surpass those that do not in terms of segmentation performance. This work proposed a loss function that mined the information of specific pixels deeply without incurring additional training costs, significantly improving the automation of neural networks in image segmentation tasks. This loss function adapts to dermoscopic images of varying qualities and demonstrates higher effectiveness and robustness compared to other boundary loss functions, making it suitable for image segmentation tasks across various neural networks.

Just Published
Relevant
Bone marrow edema detection for diagnostic support of axial spondyloarthritis using MRI.

This study proposes a process for detecting slices with bone marrow edema (BME), a typical finding of axSpA, using MRI scans as the input. This process does not require manual input of ROIs and provides the results of the judgment of the presence or absence of BME on a slice and the location of edema as the rationale for the judgment. First, the signal intensity of the MRI scans of the sacroiliac joint was normalized to reduce the variation in signal values between scans. Next, slices containing synovial joints were extracted using a slice selection network. Finally, the BME slice detection network determines the presence or absence of the BME in each slice and outputs the location of the BME. The proposed method was applied to 86 MRI scans collected from 15 hospitals in Japan. The results showed that the average absolute error of the slice selection process was 1.49 slices for the misalignment between the upper and lower slices of the synovial joint range. The accuracy, sensitivity, and specificity of the BME slice detection network were 0.905, 0.532, and 0.974, respectively. This paper proposes a process to detect the slice with BME and its location as the rationale of the judgment from an MRI scan and shows its effectiveness using 86 MRI scans. In the future, we plan to develop a process for detecting other findings such as bone erosion from MR scans, followed by the development of a diagnostic support system.

Just Published
Relevant
Automated segmentation and deep learning classification of ductopenic parotid salivary glands in sialo cone-beam CT images.

This study addressed the challenge of detecting and classifying the severity of ductopenia in parotid glands, a structural abnormality characterized by a reduced number of salivary ducts, previously shown to be associated with salivary gland impairment. The aim of the study was to develop an automatic algorithm designed to improve diagnostic accuracy and efficiency in analyzing ductopenic parotid glands using sialo cone-beam CT (sialo-CBCT) images. We developed an end-to-end automatic pipeline consisting of three main steps: (1) region of interest (ROI) computation, (2) parotid gland segmentation using the Frangi filter, and (3) ductopenia case classification with a residual neural network (RNN) augmented by multidirectional maximum intensity projection (MIP) images. To explore the impact of the first two steps, the RNN was trained on three datasets: (1) original MIP images, (2) MIP images with predefined ROIs, and (3) MIP images after segmentation. Evaluation was conducted on 126 parotid sialo-CBCT scans of normal, moderate, and severe ductopenic cases, yielding a high performance of 100% for the ROI computation and 89% for the gland segmentation. Improvements in accuracy and F1 score were noted among the original MIP images (accuracy: 0.73, F1 score: 0.53), ROI-predefined images (accuracy: 0.78, F1 score: 0.56), and segmented images (accuracy: 0.95, F1 score: 0.90). Notably, ductopenic detection sensitivity was 0.99 in the segmented dataset, highlighting the capabilities of the algorithm in detecting ductopenic cases. Our method, which combines classical image processing and deep learning techniques, offers a promising solution for automatic detection of parotid glands ductopenia in sialo-CBCT scans. This may be used for further research aimed at understanding the role of presence and severity of ductopenia in salivary gland dysfunction.

Just Published
Relevant
Augmented reality for endoscopic transsphenoidal surgery: evaluating design factors with neurosurgeons.

This study investigates the potential utility of augmented reality (AR) in the endoscopic transsphenoidal approach (TSA). While previous research has addressed technical challenges in AR for TSA, this paper explores how design factors can improve AR for neurosurgeons from a human-centred design perspective. Preliminary qualitative research involved observations of TSA procedures ( ) and semi-structured interviews with neurosurgeons ( ). These informed the design of an AR mockup, which was evaluated with neurosurgeons ( ). An interactive low-fidelity prototype-the "AR-assisted Navigation for the TransSphenoidal Approach (ANTSA)"-was then developed in Unity 3D. A user study ( ) evaluated the low-fidelity prototype of ANTSA through contextual interviews, providing feedback on design factors. AR visualisations may be beneficial in streamlining the sellar phase and reducing intraoperative errors such as excessive or inadequate exposure. Key design recommendations include a lean mesh rendering, an intuitive colour palette, and optional structure highlighting. This research presents user-centred design guidelines to improve sensemaking and surgical workflow in the sellar phase of TSA, with the goal of improving clinical outcomes. The specific improvements that AR could bring to the workflow are discussed along with surgeons' reservations and its possible application towards training less experienced physicians.

Open Access Just Published
Relevant
Synchronising a stereoscopic surgical video stream using specular reflection.

A stereoscopic surgical video stream consists of left-right image pairs provided by a stereo endoscope. While the surgical display shows these image pairs synchronised, most capture cards cause de-synchronisation. This means that the paired left and right images may not correspond once used in downstream tasks such as stereo depth computation. The stereo synchronisation problem is to recover the corresponding left-right images. This is particularly challenging in the surgical setting, owing to the moist tissues, rapid camera motion, quasi-staticity and real-time processing requirement. Existing methods exploit image cues from the diffuse reflection component and are defeated by the above challenges. We propose to exploit the specular reflection. Specifically, we propose a powerful left-right comparison score (LRCS) using the specular highlights commonly occurring on moist tissues. We detect the highlights using a neural network, characterise them with invariant descriptors, match them, and use the number of matches to form the proposed LRCS. We perform evaluation against 147 existing LRCS in 44 challenging robotic partial nephrectomy and robotic-assisted hepatic resection video sequences with simulated and real de-synchronisation. The proposed LRCS outperforms, with an average and maximum offsets of 0.055 and 1 frames and 94.1±3.6% successfully synchronised frames. In contrast, the best existing LRCS achieves an average and maximum offsets of 0.3 and 3 frames and 81.2±6.4% successfully synchronised frames. The use of specular reflection brings a tremendous boost to the real-time surgical stereo synchronisation problem.

Just Published
Relevant
Automatic robotic doppler sonography of leg arteries.

Robot-assisted systems offer an opportunity to support the diagnostic and therapeutic treatment of vascular diseases to reduce radiation exposure and support the limited medical staff in vascular medicine. In the diagnosis and follow-up care of vascular pathologies, Doppler ultrasound has become the preferred diagnostic tool. The study presents a robotic system for automatic Doppler ultrasound examinations of patients' leg vessels. The robotic system consists of a redundant 7 DoF serial manipulator, to which a 3D ultrasound probe is attached. A compliant control was employed, whereby the transducer was guided along the vessel with a defined contact force. Visual servoing was used to correct the position of the probe during the scan so that the vessel can always be properly visualized. To track the vessel's position, methods based on template matching and Doppler sonography were used. Our system was able to successfully scan the femoral artery of seven volunteers automatically for a distance of 20cm. In particular, our approach using Doppler ultrasound data showed high robustness and an accuracy of 10.7 (±3.1) px in determining the vessel's position and thus outperformed our template matching approach, whereby an accuracy of 13.9 (±6.4) px was achieved. The developed system enables automated robotic ultrasound examinations of vessels and thus represents an opportunity to reduce radiation exposure and staff workload. The integration of Doppler ultrasound improves the accuracy and robustness of vessel tracking, and could thus contribute to the realization of routine robotic vascular examinations and potential endovascular interventions.

Open Access Just Published
Relevant
Improving lung nodule segmentation in thoracic CT scans through the ensemble of 3D U-Net models.

The current study explores the application of 3D U-Net architectures combined with Inception and ResNet modules for precise lung nodule detection through deep learning-based segmentation technique. This investigation is motivated by the objective of developing a Computer-Aided Diagnosis (CAD) system for effective diagnosis and prognostication of lung nodules in clinical settings. The proposed method trained four different 3D U-Net models on the retrospective dataset obtained from AIIMS Delhi. To augment the training dataset, affine transformations and intensity transforms were utilized. Preprocessing steps included CT scan voxel resampling, intensity normalization, and lung parenchyma segmentation. Model optimization utilized a hybrid loss function that combined Dice Loss and Focal Loss. The model performance of all four 3D U-Nets was evaluated patient-wise using dice coefficient and Jaccard coefficient, then averaged to obtain the average volumetric dice coefficient (DSCavg) and average Jaccard coefficient (IoUavg) on a test dataset comprising 53 CT scans. Additionally, an ensemble approach (Model-V) was utilized featuring 3D U-Net (Model-I), ResNet (Model-II), and Inception (Model-III) 3D U-Net architectures, combined with two distinct patch sizes for further investigation. The ensemble of models obtained the highest DSCavg of 0.84 ± 0.05 and IoUavg of 0.74 ± 0.06 on the test dataset, compared against individual models. It mitigated false positives, overestimations, and underestimations observed in individual U-Net models. Moreover, the ensemble of models reduced average false positives per scan in the test dataset (1.57 nodules/scan) compared to individual models (2.69-3.39 nodules/scan). The suggested ensemble approach presents a strong and effective strategy for automatically detecting and delineating lung nodules, potentially aiding CAD systems in clinical settings. This approach could assist radiologists in laborious and meticulous lung nodule detection tasks in CT scans, improving lung cancer diagnosis and treatment planning.

Just Published
Relevant