Deformable Medical Image Registration Based on Multi-level Transformation Progressive and Image Enhancement
Abstract Medical image registration is a critical problem in medical image analysis, enabling the spatial alignment of anatomical structures across different imaging modalities. However, existing algorithms often struggle with local registration of large deformations and exhibit limited feature extraction capabilities. To address these challenges, we propose a Multi-level Transformation Progressive Registration Algorithm (MTPR). This method incorporates the concept of multi-level transformations, the model performs four progressive registration steps, predicting the deformation field from coarse to fine. Initially, the model applies an enhancement process, introducing a hybrid filtering enhancement module based on wavelet transform and improved guided filtering to enhance image edges. During the registration phase, we propose a pyramid shared weight enhancement network (PWE-Net), which precisely extracts multimodal image features and implements a progressive deformation field prediction strategy. In order to increase the feature extraction capability of the model, we propose spatial feature fusion module in skip connection of encoder and decoder, which combines multi-scale information into a spatial feature representation with rich context information. Additionally, we introduce a dual-similarity metric to enhance model capability for local organ registration by incorporating structural similarity, increasing the model's attention to local organs. Experiments conducted on publicly available datasets (OASIS, LPBA40) and clinical CT/MR data, achieved an average dice similarity coefficient (DSC) of 0.822, average average symmetric surface distance (ASSD) of 0.741 mm, average standard deviation of jacobian determinant (Std. Jacobian) of 0.247 in the clinical CT/MR data. The Wilcoxon signed-rank test statistical analysis shows that the evaluation indicators of the MTPR algorithm are significantly better than other baseline methods (P < 0.05). The MTPR model's multi-scale information aggregation capability effectively handles large deformations, demonstrating excellent registration accuracy and generalization performance.
- Research Article
29
- 10.3390/electronics11233935
- Nov 28, 2022
- Electronics
In this paper, an automatic speech emotion recognition (SER) task of classifying eight different emotions was experimented using parallel based networks trained using the Ryeson Audio-Visual Dataset of Speech and Song (RAVDESS) dataset. A combination of a CNN-based network and attention-based networks, running in parallel, was used to model both spatial features and temporal feature representations. Multiple Augmentation techniques using Additive White Gaussian Noise (AWGN), SpecAugment, Room Impulse Response (RIR), and Tanh Distortion techniques were used to augment the training data to further generalize the model representation. Raw audio data were transformed into Mel-Spectrograms as the model’s input. Using CNN’s proven capability in image classification and spatial feature representations, the spectrograms were treated as an image with the height and width represented by the spectrogram’s time and frequency scales. Temporal feature representations were represented by attention-based models Transformer, and BLSTM-Attention modules. Proposed architectures of the parallel CNN-based networks running along with Transformer and BLSTM-Attention modules were compared with standalone CNN architectures and attention-based networks, as well as with hybrid architectures with CNN layers wrapped in time-distributed wrappers stacked on attention-based networks. In these experiments, the highest accuracy of 89.33% for a Parallel CNN-Transformer network and 85.67% for a Parallel CNN-BLSTM-Attention Network were achieved on a 10% hold-out test set from the dataset. These networks showed promising results based on their accuracies, while keeping significantly less training parameters compared with non-parallel hybrid models.
- Research Article
508
- 10.1016/j.media.2022.102615
- Nov 1, 2022
- Medical image analysis
TransMorph: Transformer for unsupervised medical image registration.
- Research Article
1
- 10.21037/qims-24-1138
- Dec 1, 2024
- Quantitative imaging in medicine and surgery
Current medical image registration methods based on Transformer still encounter challenges, including significant local intensity differences and limited computational efficiency when dealing with three-dimensional (3D) computed tomography (CT) and cone beam CT (CBCT) images. These limitations hinder the precise alignment necessary for effective diagnosis and treatment planning. Therefore, the aim of this study is to develop a novel method that overcomes these challenges by enhancing feature interaction and computational efficiency in 3D medical image registration. This paper introduces a novel method that enhances feature interaction within Transformer by computing attention within resizable spatial perpendicular window (SPW). Additionally, it introduces a self-learning mapping control (SLMC) mechanism, which uses a mini convolutional neural network (CNN) to adaptively transform feature vectors into probability vectors. This approach is integrated into the UNet framework, resulting in the SPW-TransUNet. The effectiveness of the SPW-TransUNet is demonstrated through evaluations on two critical 3D medical imaging tasks: CT-CBCT registration and inter-CT registration. We utilized a range of evaluation metrics including Dice similarity coefficient (DICE), structural similarity index measure (SSIM), target registration error (TRE), and negative Jacobian percentage. The validation process involved comparative analysis against established baseline methods using statistical tests to ensure the robustness and reliability of our results. The proposed method demonstrated outstanding performance in the registration of 124 pairs of CT-CBCT lung images from 20 patients, achieving the lowest TRE of 2.16 mm and a minimal negative Jacobian of 0.126. It also recorded the highest SSIM and Dice coefficient of 86.87% and 88.28%, respectively. For the liver CT task involving 150 patients, the method achieved peak SSIM and DICE scores of 76.92% and 85.77%, respectively. Furthermore, ablation studies confirmed the effectiveness of the designed structural components. The SPW-TransUNet offers significant improvements in feature interaction and computational efficiency for medical image registration, providing an effective reference solution for patient and target localization in image-guided radiation therapy.
- Research Article
- 10.11834/jig.211019
- Jan 1, 2023
- Journal of Image and Graphics
目的 针对人体组织器官及病灶区域的3维图像分割是计算机辅助医疗诊断的重要前提,是医学影像3维可视化的重要技术基础。深度学习方法在医学图像分割任务中的成功通常取决于大量有标注数据。半监督学习利用未标注数据容易获取的优点,在模型训练过程中使用少量标注数据和大量未标注数据进行学习,缓解了数据标注昂贵耗时的问题,在医学图像分割中受到了广泛关注。为更好地利用无标注数据,提升医学图像分割效果,提出一种新的一致性正则方法用于半监督3维医学图像分割。方法 模型以V-Net为基础架构,通过扩展网络结构,在均带有分割任务及回归任务属性的双任务主副解码器之间添加了用于正则化约束的交叉损失,构建了具有形状感知的基于双任务的交叉一致性正则网络SACC-Net (shape-aware cross-consistency regular network based on dual tasks),实现将数据层面和模型层面的扰动融合进多任务机制的一致性正则方法,使模型能够更好地利用未标注数据的有效先验信息,并且具有更好的泛化性能。结果 在MICCAI 2018(Medical Image Computing and Computer Assisted Intervention Society)心房分割挑战赛公布的数据集中的3维左心房核磁共振成像上验证本文算法,在仅使用训练集中10%的有标注数据的实验组中,Dice系数、Jaccard指数、HD(Hausdorff distance)距离和平均对称表面距离分别为88.01%、78.89%、8.19和2.09。在另一组仅使用20%的有标注数据的实验中Dice系数、Jaccard指数、HD距离和平均对称表面距离分别达到90.11%、82.11%、6.57和1.78。结论 本文提出的半监督分割模型性能显著,综合了数据、模型和任务层面的一致性正则约束,与其他半监督方法相比分割效果更好且具有更佳的泛化性能。;Objective Three-dimensional(3D)image segmentation of human tissues,organs and lesion areas is projected for computer-aided diagnosis and medical images-related 3D visualization. Thanks to the emerging deep learning technique,fully-supervised network models have been developing intensively in relevant to medical image segmentation tasks. However,it is challenged for a large amount of annotated data and 3D image segmentation data-labeled is costly and inefficient. Semi-supervised learning is focused on a small size of data-labeled and sufficient data-unlabeled in terms of easy acquisition of unlabeled data,which can alleviate the cost and time-consuming problem of data labeling. Our research is focused on new consistency regular method for semi-supervised 3D medical image segmentation model. To improve the medical image segmentation effect,our model can use unlabeled data through the fusion of different consistency methods. Method The network model is demonstrated on the V-Net,which can remove the residual structure of encoding and decoding. To get efficient features of unlabeled data,the proposed shape-aware cross-consistency regular network is introduced via the V-Net network structure extension on the basis of an encoder and two independent decoders-involved dual tasks (shape-aware cross-consistency regular network based on dual tasks(SACC-Net)),which are divided into a main decoder a and an auxiliary decoder b. The output of encoder-shared is transmitted to the two decoders after noise disturbance. At the same time,the two decoders can output the prediction results after each iteration. To increase the generalization and anti-noise ability of the model,it can minimize the difference between the two parts of the results during the training process. Additionally,the proportion of labeled samples in the training samples is extremely small because the feature distributions between the pre-processed medical image samples are relatively similar. To improve the learning ability of the model to segmented samples further,geometric prior information constraints are melted into the segmentation target. A shape-aware regression layer is added at the end of each decoder as well. During the training phase,each decoder can output two parts of the prediction results at the same time. That is,the total output of each iteration-after network consists of four parts. It can be used to decode the segmentation map SA and the signed distance map output by the decoder A,and the segmentation map SB and the signed distance map output by the decoder B through the dual-task consistency of each decoding part. To enhance the model’s ability and learn the effective features of segmentation targets to a greater extent, constraints and cross constraints can be used to realize a consistent regular method that combines data-level and modellevel disturbances with multi-task mechanisms,and make better use of unlabeled data. Result Our algorithm is validated on the MRI data set published in Atrium Segmentation Challenge held by MICCAI(Medical Image Computing and Computer Assisted Intervention Society)in 2018. The experiment is divided into two test groups based on the amounts proportion of labeled data. In the training set,10% annotated data is used only in the experimental group,the Dice coefficient, Jaccard index,HD (Hausdorff distance) distance,and average symmetric surface distance is reached to 88. 01%, 78. 89%,8. 19,and 2. 09 of each. In the other group,20% annotated data of the experiments are used only. The median Dice coefficient,Jaccard index,HD distance and average symmetric surface distance can be reached to 90. 14%, 82. 11%,6. 57,and 1. 78 each as well. Furthermore,In respect of the shape perception method using the level set function for regression tasks,the Dice evaluation index can be improved by 0. 69 and 0. 60 in comparison with shape-aware semi-supervised net(SASS Net)in 10% and 20% of the marked training results. Each improvement is reached to 1. 44% and 0. 72% in terms of comparative results of dual-task consistency(DTC)trained with 10% and 20% labeled data. Conclusion The semi-supervised segmentation model(SACC-Net)is illustrated for the criteria optimization for both of the region and boundary-based segmentation,which can incorporate the level-consistency of its data,model and task. The constrained method has its potential segmentation effect and generalization performance for semi-supervised methods.
- Front Matter
3
- 10.1016/j.compmedimag.2015.07.002
- Jul 16, 2015
- Computerized Medical Imaging and Graphics
Sparsity techniques in medical imaging
- Book Chapter
71
- 10.1007/978-3-030-67194-5_1
- Jan 1, 2021
This paper presents an overview of the first HEad and neCK TumOR (HECKTOR) challenge, organized as a satellite event of the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2020. The task of the challenge is the automatic segmentation of head and neck primary Gross Tumor Volume in FDG-PET/CT images, focusing on the oropharynx region. The data were collected from five centers for a total of 254 images, split into 201 training and 53 testing cases. The interest in the task was shown by the important participation with 64 teams registered and 18 team submissions. The best method obtained a Dice Similarity Coefficient (DSC) of 0.7591, showing a large improvement over our proposed baseline method with a DSC of 0.6610 as well as inter-observer DSC agreement reported in the literature (0.69).KeywordsAutomatic segmentationChallengeMedical imagingHead and neck cancerOropharynx
- Research Article
2
- 10.3390/app14010095
- Dec 21, 2023
- Applied Sciences
Medical image registration is a fundamental and indispensable element in medical image analysis, which can establish spatial consistency among corresponding anatomical structures across various medical images. Since images with different modalities exhibit different features, it remains a challenge to find their exact correspondence. Most of the current methods based on image-to-image translation cannot fully leverage the available information, which will affect the subsequent registration performance. To solve the problem, we develop an unsupervised multimodal image registration method named DTR-GAN. Firstly, we design a multimodal registration framework via a bidirectional translation network to transform the multimodal image registration into a unimodal registration, which can effectively use the complementary information of different modalities. Then, to enhance the quality of the transformed images in the translation network, we design a multiscale encoder–decoder network that effectively captures both local and global features in images. Finally, we propose a mixed similarity loss to encourage the warped image to be closer to the target image in deep features. We extensively evaluate methods for MRI-CT image registration tasks of the abdominal cavity with advanced unsupervised multimodal image registration approaches. The results indicate that DTR-GAN obtains a competitive performance compared to other methods in MRI-CT registration. Compared with DFR, DTR-GAN has not only obtained performance improvements of 2.35% and 2.08% in the dice similarity coefficient (DSC) of MRI-CT registration and CT-MRI registration on the Learn2Reg dataset but has also decreased the average symmetric surface distance (ASD) by 0.33 mm and 0.12 mm on the Learn2Reg dataset.
- Conference Article
2
- 10.1109/bigmm.2016.21
- Apr 1, 2016
Due to the wide variety of copy videos, the existing video copy detection methods using single feature face great challenges, especially for video content matching, which are difficult to deal with various copy video transformations. To overcome this problem, a video copy detection method based on sparse representation of MPEG-2 spatial and temporal features is proposed in this paper. Firstly, the key frames are extracted based on visual saliency model, Then the global feature (HSV color histograms) and local feature (ORB features) are extracted from the key frames, Meanwhile, the key frames are represented compactly by sparse coding which exploits ORB features, and motion vectors (MV) extracted from the video bitstreams are exploited to build MV angle histograms. Finally, spatial feature and temporal feature are compared respectively, and matching results are fused to generate the final copy detection judgement. The experimental results on dataset TRECVID 2009 show that the proposed method presents better robustness and higher time efficiency.
- Research Article
82
- 10.1016/j.ijrobp.2020.10.038
- Nov 9, 2020
- International Journal of Radiation Oncology*Biology*Physics
Automatic Segmentation Using Deep Learning to Enable Online Dose Optimization During Adaptive Radiation Therapy of Cervical Cancer
- Abstract
3
- 10.1016/j.ijrobp.2018.06.160
- Oct 20, 2018
- International Journal of Radiation Oncology*Biology*Physics
Automated Contouring of Contrast and Non-Contrast CT Liver Images with Fully Convolutional Neural Networks
- Research Article
14
- 10.1016/j.zemedi.2023.05.003
- Jun 22, 2023
- Zeitschrift fuer Medizinische Physik
Deep learning-based affine medical image registration for multimodal minimal-invasive image-guided interventions – A comparative study on generalizability
- Research Article
1730
- 10.1109/tmi.2019.2897538
- Feb 4, 2019
- IEEE transactions on medical imaging
We present VoxelMorph, a fast learning-based framework for deformable, pairwise medical image registration. Traditional registration methods optimize an objective function for each pair of images, which can be time-consuming for large datasets or rich deformation models. In contrast to this approach, and building on recent learning-based methods, we formulate registration as a function that maps an input image pair to a deformation field that aligns these images. We parameterize the function via a convolutional neural network (CNN), and optimize the parameters of the neural network on a set of images. Given a new pair of scans, VoxelMorph rapidly computes a deformation field by directly evaluating the function. In this work, we explore two different training strategies. In the first (unsupervised) setting, we train the model to maximize standard image matching objective functions that are based on the image intensities. In the second setting, we leverage auxiliary segmentations available in the training data. We demonstrate that the unsupervised model's accuracy is comparable to state-of-the-art methods, while operating orders of magnitude faster. We also show that VoxelMorph trained with auxiliary data improves registration accuracy at test time, and evaluate the effect of training set size on registration. Our method promises to speed up medical image analysis and processing pipelines, while facilitating novel directions in learning-based registration and its applications. Our code is freely available at https://github.com/voxelmorph/voxelmorph.
- Research Article
1
- 10.1002/cpe.8265
- Oct 17, 2024
- Concurrency and Computation: Practice and Experience
SummaryCamellia oleifera typically thrives in unstructured environments, making the identification of its trunks crucial for advancing agricultural robots towards modernization and sustainability. Traditional target detection algorithms, however, fall short in accurately identifying Camellia oleifera trunks, especially in scenarios characterized by small targets and poor lighting. This article introduces an enhanced trunk detection algorithm for Camellia oleifera based on an improved YOLOv7 model. This model incorporates dynamic snake convolution instead of standard convolutions to bolster its feature extraction capabilities. It integrates more contextual information, thus enhancing the model's generalization ability across various scenes. Additionally, coordinate attention is introduced to refine the model's spatial feature representation, amplifying the network's focus on essential target region features, which in turn boosts detection accuracy and robustness. This feature selectively strengthens response levels across different channels, prioritizing key attributes for classification and localization. Moreover, the original coordinate loss function of YOLOv7 is replaced with EIoU loss, further enhancing the model's robustness and convergence speed. Experimental results demonstrate a recall rate of 96%, a mean average precision (mAP) of 87.9%, an F1 score of 0.87, and a detection speed of 18 milliseconds per frame. When compared with other models like Faster‐RCNN, YOLOv3, ScaledYOLOv4, YOLOv5, and the original YOLOv7, our improved model shows mAP increases of 8.1%, 7.0%, 7.5%, and 6.6% respectively. Occupying only 70.8 MB, our model requires 9.8 MB less memory than the original YOLOv7. This model not only achieves high accuracy and detection efficiency but is also easily deployable on mobile devices, providing a robust foundation for future intelligent harvesting technologies.
- Conference Article
1
- 10.1109/icacce46606.2019.9079996
- Apr 1, 2019
Modern medicine has become reliant on medical imaging. The application of computer has proved to be an emerging technique in medical imaging and medical image analysis. Each level of analysis requires an effective algorithm as well as methods in order to generate an accurate and reliable result. Different modalities, such as X-Ray, Magnetic resonance imaging (MRI), Ultrasound, Computed tomography (CT), etc. are used for both diagnoses as well as therapeutic purposes in which it provides as much information about the patient as possible. Medical image processing includes image fusion, matching or warping which is the task of image registration. Medical image analysis includes Image enhancement, segmentation, quantification, registration, which is the most predominant ways to analyze the image. There are various difficulties in medical image processing and subsequent stages like image enhancement and its restoration; segmentation of features; registration and fusion of multimodality images; classification of medical images; image features measurement and analysis and assessment of measurement and development of integrated medical imaging systems for the medical field. In this paper, the techniques used in medical image analysis have been reviewed and discussed extensively. The vital goal of this review is to identify the current state of the art of medical image analysis methods as a reference paradigm in order to accelerate the performance of existing methods.
- Research Article
- 10.54294/ufyfho
- Jan 10, 2006
- The Insight Journal
This course introduces attendees to select open-source efforts in the field of medical image analysis. Opportunities for users and developers are presented. The course particularly focuses on the open-source Insight Toolkit (ITK) for medical image segmentation and registration. The course describes the procedure for downloading and installing the toolkit and covers the use of its data representation and filtering classes. Attendees are shown how ITK can be used in their research, rapid prototyping, and application development.LEARNING OUTCOMES After completing this course, attendees will be able to: contribute to and benefit from open-source software for medical image analysis download and install the ITK toolkit start their own software project based on ITK design and construct an image processing pipeline combine ITK filters for medical image segmentation combine ITK components for medical image registrationINTENDED AUDIENCE This course is intended for anyone involved in medical image analysis. In particular it targets graduate students, researchers and professionals in the areas of computer science and medicine. Attendees should have an intermediate level on object oriented programming with C++ and must be familiar with the basics of medical image processing and analysis.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.