Articles published on Image translation
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
1351 Search results
Sort by Recency
- New
- Research Article
- 10.1186/s13195-025-01921-5
- Jan 9, 2026
- Alzheimer's research & therapy
- Julia R Bacci + 14 more
Clinical translation of fluid, imaging, and digital biomarkers for Alzheimer's disease.
- New
- Research Article
- 10.1016/j.compmedimag.2026.102705
- Jan 8, 2026
- Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society
- Guido Manni + 3 more
SPARSE data, rich results: Few-shot semi-supervised learning via class-conditioned image translation.
- New
- Research Article
- 10.1117/1.jmi.13.1.014002
- Jan 6, 2026
- Journal of medical imaging (Bellingham, Wash.)
- Savannah P Hays + 9 more
Visualization of subcortical gray matter is essential in neuroscience and clinical practice, particularly for disease understanding and surgical planning. Although multi-inversion time (multi-TI) -weighted ( -w) magnetic resonance (MR) imaging improves visualization, it is only acquired in specific clinical settings and not available in common public MR datasets. We present SyMTIC (synthetic multi-TI contrasts), a deep learning method that generates synthetic multi-TI images using routinely acquired -w, -weighted ( -w), and fluid-attenuated inversion recovery (FLAIR) images. Our approach combines image translation via deep neural networks with imaging physics to estimate longitudinal relaxation time ( ) and proton density ( ) maps. These maps are then used to compute multi-TI images with arbitrary inversion times. SyMTIC was trained using paired magnetization prepared rapid acquisition with gradient echo (MPRAGE) and fast gray matter acquisition T1 inversion recovery (FGATIR) images along with -w and FLAIR images. It accurately synthesized multi-TI images from standard clinical inputs, achieving image quality comparable to that from explicitly acquired multi-TI data. The synthetic images, especially for TI values between 400 to 800ms, enhanced visualization of subcortical structures and improved segmentation of thalamic nuclei. SyMTIC enables robust generation of high-quality multi-TI images from routine MR contrasts. When paired with the HACA3 algorithm, it generalizes well to varied clinical datasets, including those without FLAIR or -w images and unknown parameters, offering a practical solution for improving brain MR image visualization and analysis.
- New
- Research Article
- 10.1016/j.infrared.2025.106266
- Jan 1, 2026
- Infrared Physics & Technology
- Xiaoshen Yang + 6 more
AGMD-GAN: Attention-based generator with multi-scaled feature extraction discriminator for unpaired visible to infrared image translation
- New
- Research Article
- 10.1109/tmi.2025.3650412
- Jan 1, 2026
- IEEE transactions on medical imaging
- Sebastian Rassmann + 3 more
While Generative Adversarial Nets (GANs) and Diffusion Models (DMs) have achieved impressive results in natural image synthesis, their core strengths - creativity and realism - can be detrimental in medical applications, where accuracy and fidelity are paramount. These models instead risk introducing hallucinations and replication of unwanted acquisition noise. Here, we propose YODA (You Only Denoise once - or Average), a 2.5D diffusion-based framework for medical image translation (MIT). Consistent with DM theory, we find that conventional diffusion sampling stochastically replicates noise. To mitigate this, we draw and average multiple samples, akin to physical signal averaging. As this effectively approximates the DM's expected value, we term this Expectation-Approximation (ExpA) sampling. We additionally propose regression sampling YODA, which retains the initial DM prediction and omits iterative refinement to produce noise-free images in a single step. Across five diverse multi-modal datasets - including multi-contrast brain MRI and pelvic MRI-CT - we demonstrate that regression sampling is not only substantially more efficient but also matches or exceeds image quality of full diffusion sampling even with ExpA. Our results reveal that iterative refinement solely enhances perceptual realism without benefiting information translation, which we confirm in relevant downstream tasks. YODA outperforms eight state-of-the-art DMs and GANs and challenges the presumed superiority of DMs and GANs over computationally cheap regression models for high-quality MIT. Furthermore, we show that YODA-translated images are interchangeable with, or even superior to, physical acquisitions for several medical applications.
- New
- Research Article
- 10.1016/j.radonc.2025.111321
- Jan 1, 2026
- Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology
- Yunxiang Li + 5 more
A universal medical imaging modality translation model in brain and head-and-neck radiotherapy.
- New
- Research Article
- 10.1016/j.artmed.2025.103335
- Dec 30, 2025
- Artificial intelligence in medicine
- Lulin Shi + 7 more
UniStain: A unified and organ-aware virtual H&E staining framework for label-free autofluorescence images.
- Research Article
- 10.3390/rs18010055
- Dec 24, 2025
- Remote Sensing
- Cheng Xu + 1 more
Synthetic aperture radar (SAR), with its all-weather and all-day observation capabilities, plays a significant role in the field of remote sensing. However, due to the unique imaging mechanism of SAR, its interpretation is challenging. Translating SAR images into optical remote sensing images has become a research hotspot in recent years to enhance the interpretability of SAR images. This paper proposes a deep learning-based method for SAR-to-optical remote sensing image translation. The network comprises three parts: a global representor, a generator with cascaded multi-head attention, and a multi-scale discriminator. The global representor, built upon InternImage with deformable convolution v3 (DCNv3) as its core operator, leverages its global receptive field and adaptive spatial aggregation capabilities to extract global semantic features from SAR images. The generator follows the classic “encoder-bottleneck-decoder” structure, where the encoder focuses on extracting local detail features from SAR images. The cascaded multi-head attention module within the bottleneck layer optimizes local detail features and facilitates feature interaction between global semantics and local details. The discriminator adopts a multi-scale structure based on the local receptive field PatchGAN, enabling joint global and local discrimination. Furthermore, for the first time in SAR image translation tasks, structural similarity index metric (SSIM) loss is combined with adversarial loss, perceptual loss, and feature matching loss as the loss function. A series of experiments demonstrate the effectiveness and reliability of the proposed method. Compared to mainstream image translation methods, our method ultimately generates higher-quality optical remote sensing images that are semantically consistent, texturally authentic, clearly detailed, and visually reasonable appearances.
- Research Article
- 10.3390/electronics15010052
- Dec 23, 2025
- Electronics
- Jhilik Bhattacharya + 3 more
The transformer architecture and its attention-based modules have become quite popular recently and are used for solving most computer vision tasks. However, there have been attempts to explore whether other modules can perform equally well with lower computational costs. In this paper, we introduce a nonlinear convolution structure composed of learnable polynomial and Fourier features, which allows better spectral representation with fewer parameters. The solution we propose is in principle feasible for many CNN application fields, and we present its theoretical motivation. Next, to demonstrate the performance of our architecture, and we exploit it for a paradigmatic task: image translation in driving-related scenarios such as deraining, dehazing, dark-to-bright, and night-to-day transformations. We use specific benchmark datasets for each task and standard quality parameters. The results show that our network provides acceptable or better performances when compared to transformer-based architectures, with a major reduction in the network size due to the use of such a nonlinear convolution block.
- Research Article
- 10.58806/ijirme.2025.v4i12n13
- Dec 23, 2025
- INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN MULTIDISCIPLINARY EDUCATION
- Dhakaa Mohsin Kareem
Image-to-image translation poses a significant challenge, garnering substantial research attention in recent years. This study aims to devise a versatile image-to-image translation method capable of producing superior images with minimal user intervention. The proposed solution, the Pix2Pix GAN, is a Generative Adversarial Network (GAN) designed to translate images seamlessly between different domains. The Pix2Pix GAN comprises two integral components: a generator and a discriminator. The generator's role is to produce images originating from the source domain, while the discriminator is tasked with distinguishing between real and generated images. Both networks undergo adversarial training, engaging in a competitive dynamic. The generator strives to create images that are indistinguishable from real ones, while the discriminator endeavors to identify disparities between real and generated images. The effectiveness of the Pix2Pix GAN was assessed across various image-to-image translation scenarios, such as transitioning from day to night, converting sketches to photos, and transforming paintings into photographs. The Pix2Pix GAN consistently demonstrated the capability to generate high-quality images across these tasks, requiring minimal user input. In conclusion, the Pix2Pix GAN presents a promising paradigm for image-to-image translation. Its capacity to produce top-tier images with minimal user intervention establishes it as a robust solution applicable to a diverse array of image translation tasks.
- Research Article
- 10.1002/mrm.70225
- Dec 19, 2025
- Magnetic resonance in medicine
- Julia E Markus + 15 more
Our goal was to understand the barriers and challenges to clinical translation of quantitative MR (qMR) as perceived by stakeholders in the UK. We conducted an electronic survey on seven key areas related to clinical translation of qMR, developed at the BIC-ISMRM workshop: "Steps on the path to clinical translation". Based on the seven areas identified: (i) clinical workflow, (ii) changes in clinical practice, (iii) improving validation, (iv) standardization of data acquisition and analysis, (v) sharing of data and code, (vi) improving quality management, and (vii) end-user engagement, a 40-question survey was developed. Members of BIC-ISMRM, MR-PHYSICS, BSNR and institutional mailing lists were invited to respond to the online survey over a 5-week period between September and October 2022. The responses were analysed via descriptive statistics of multiple-choice questions, Likert scores and a thematic analysis of free text questions. A total of 69 responses were received from predominantly research imaging scientists (69%) in numerous centres across the UK. Three main themes were identified: (1) Consensus; the need to develop in terminology, decision making and validation; (2) Context Dependency; an appreciation of the uniqueness of each clinical situation, and (3) Product Profile; a clear description of the imaging biomarker and its intended use. Effective translation of qMR imaging and spectroscopic biomarkers to achieve their full clinical potential must address the differing needs and expectations of a wide range of stakeholders.
- Research Article
- 10.55041/ijsrem55292
- Dec 17, 2025
- International Journal of Scientific Research in Engineering and Management
- Sinchana N + 4 more
ABSTRACT This project presents an integrated AI- powered multilingual translation and navigation system designed to extract text from images, translate multilingual content, convert speech into text, and provide real-time route and distance information using Google Maps. The system is implemented using a Flask web framework and leverages Google Gemini Vision for high-accuracy image-to-text extraction. Deep_translator is used for fast and efficient text translation across multiple Indian languages, while the SpeechRecognition module enables precise speech-to-text conversion for improved accessibility. Additionally, the Google Maps API is incorporated to offer accurate distance calculation and route tracking between user- specified locations. The developed application provides a seamless and user-friendly interface that unifies image processing, translation, and navigation features into a single platform. Performance evaluation demonstrates high accuracy, low response time, and stable functioning across all modules. The project further lays a strong foundation for future enhancements such as offline translation, mobile deployment, and advanced AI-driven interaction. Keywords: Gemini Vision, Image-to- Text, Multilingual Translation, Speech-to- Text, Google Maps API, Flask Web Application, Artificial Intelligence, Natural Language Processing, OCR, Route Tracking, Deep Translator, Indian Languages
- Research Article
- 10.1186/s12880-025-02111-3
- Dec 9, 2025
- BMC Medical Imaging
- Rui Qu + 16 more
BackgroundThis work aims to develop and validate a novel CycleGan-based methodology to transfer the kV planning CT (pCT) to the reference MV portal images, potentially applicable to in vivo treatment dose monitoring.MethodsThe kV projections of pCT were prepared based on the various gantry angles of MV projections using treatment beams on Varian Halcyon system. A CycleGAN-based network incorporating attention module (ECA-CycleGAN) was trained to learn the relationship between kV and MV images, which performance was compared with the conventional Pix2pix and CycleGAN methods quantitatively. The beam angles and multi-leaf collimator parameters retrieved from the clinical plans were used to segment the treatment apertures on the model-generated reference MV images, within which the sensitivity to the artificial errors were tested using gamma analysis. Cross-institutional validations were performed on multiple machines and scanning protocols.ResultsComparing the 2574 model-generated MV images with the measured ground truth of 13 validation cases, the mean ± standard deviation of the structural similarity index measure (SSIM), peak signal-to-noise ratio (PSNR) and root mean square error (RMSE) were 0.969 ± 0.007, 41.3 ± 3.2 and 1.6·10− 2±3.8·10− 4 for ECA-CycleGAN, consistently better than that of using CycleGAN (0.940 ± 0.004, 35.7 ± 3.3 and 2.6·10− 2±4.5·10− 4) and pix2pix (0.931 ± 0.005, 32.4 ± 4.0 and 4.2·10− 2±4.8·10− 4) respectively. After introducing artificial translational or rotational errors, the gamma passing rates decreased and the gamma indices increased significantly (all P < 0.05). The ECA-CycleGAN model displayed good generalizability across various pCT scanners, imaging protocols and Halcyon accelerators from 2 institutions.ConclusionWithout complex and time-consuming Monte Carlo simulations, the proposed ECA-CycleGAN network facilitates the efficient establishment of the reference MV portal images applicable to in vivo transmitted dosimetry. It may potentially improve the accuracy of dose delivery especially for the advanced treatment techniques when pretreatment measurement verification or inter-fractional dose remediation are impossible.
- Research Article
- 10.1073/pnas.2517785122
- Dec 3, 2025
- Proceedings of the National Academy of Sciences
- Feifei Wang + 11 more
Preclinical shortwave infrared/near-infrared II (SWIR/NIR-II, 1,000 to 3,000 nm) fluorescence imaging has shown superior contrast, resolution, and penetration depth compared to traditional near-infrared I (NIR-I, 700 to 900 nm) imaging, owing to reduced light scattering and tissue autofluorescence. Here, we carried out clinical translation of NIR-II fluorescence imaging to guide esophagectomy through intraoperative video imaging and rapid analysis of blood perfusion in the gastric conduits (GC) of esophageal cancer patients, following intravenous administration of indocyanine green (ICG). Within <1 min, NIR-II video imaging clearly visualized the spatial and temporal blood flow features, and importantly, intraoperative principal component analysis (PCA) of the video revealed distinct perfusion patterns in GC. This led to rapid, subjective decision-making for targeted resection of poorly perfused tissue and informed reconstruction of the GC to reduce the risk of life-threatening anastomotic leakage. This approach enhances surgical precision and improves outcomes by providing operator-independent intraoperative guidance.
- Research Article
- 10.1109/tpami.2025.3598147
- Dec 1, 2025
- IEEE transactions on pattern analysis and machine intelligence
- Yinqi Li + 3 more
Visual recognition models pretrained on clean images usually do not perform well in the presence of image corruptions, such as blurring or noise, which limits their applicability in real-world scenarios. To solve this problem, existing approaches usually design complex data augmentations to train a robust model from scratch or adapt a pretrained model to corrupted scenarios. These approaches ignore the existence of the large number of deployed models in our community, causing extensive computation and storage costs for making deployed models adapted. Based on this consideration, this paper focuses on solving a practical problem of making many clean-image-pretrained models adapt to unlabeled corrupted images through one training procedure. To this end, we aim to learn a Plug-and-play Image Translator (PIT) that can be directly combined with recognition models after training. Existing approaches, such as vanilla image translation and restoration, are not proper for solving this problem, as they are mostly based on supervised training and are not recognition-oriented. To address this issue, we propose a recognition-oriented unsupervised image translation framework to make PIT produce images with indistinguishable recognition predictions from the clean ones. We verify the effectiveness of PIT on several recognition tasks and show that PIT boosts the performance of clean-image-pretrained models significantly in the presence of image corruptions.
- Research Article
- 10.1063/5.0298999
- Dec 1, 2025
- Physics of Fluids
- Daniel Stoecklein + 4 more
Predicting the deformed shape of nested co-axial streams after extrusion through a shaped nozzle is a bottleneck for microparticle and microfiber fabrication as well as other extrusion processes, due to the need for fully resolved three-dimensional (3D) computational fluid dynamics simulations requiring significant expertise, computational resources, and time. We recast this problem as an image translation task, mapping a two-dimensional (2D) nozzle shape to a 2D output cross section of a shaped flow. By doing so, we bypass meshing and 3D solvers by using a deep learning (DL) U-Net model to predict fully miscible, low Reynolds number (Re=1) extrusion flow shapes, achieving 450× speedup with second-scale prediction times. The DL model approximately conserves mass and achieves a mean intersection over union of 0.9757 on a held-out test set of nozzle shapes. Saliency-based interpretability analysis of the DL model shows that such extrusion flows are robust to variations in nozzle wall thickness and, for low to moderate inner flow rates, to the shape of the outer flow wall. This robustness, along with the DL model's speed and accuracy, enables efficient and practical data-driven design of co-axial extrusion devices and highlights the broader potential of interpretable surrogates in microfluidic flow engineering.
- Research Article
4
- 10.1016/j.media.2025.103747
- Dec 1, 2025
- Medical image analysis
- Fuat Arslan + 4 more
Self-consistent recursive diffusion bridge for medical image translation.
- Research Article
- 10.1016/j.engappai.2025.112048
- Dec 1, 2025
- Engineering Applications of Artificial Intelligence
- Heng Zhang + 2 more
Zero-shot image translation via query compensation and style enhancement
- Research Article
- 10.1007/s00464-025-12203-4
- Dec 1, 2025
- Surgical endoscopy
- Tatsushi Tokuyasu + 5 more
Accurate intraoperative identification of scar tissue is essential for preventing bile duct injury during laparoscopic cholecystectomy (LC), especially under visually impaired conditions caused by bleeding. This study aimed to develop an artificial intelligence (AI)-based framework to enhance scar region prediction in such challenging surgical environments. A hybrid approach was proposed, combining Cycle-Consistent Generative Adversarial Network-based image translation with uncertainty-aware fusion. Bleeding-contaminated laparoscopic images were translated into pseudo non-bleeding representations using unpaired domain adaptation. Segmentation results obtained from the original and translated images were then fused based on pixel-wise entropy to improve robustness. The system was evaluated using 99 representative images from 20 surgical patients. Compared with conventional segmentation methods, the proposed framework significantly improved Dice coefficients across all three board-certified endoscopic surgeons who served as expert annotators, with all improvements demonstrating significance (P < 0.001). Subjective evaluations by the same surgeons confirmed high clinical utility, particularly in scar visibility and boundary delineation. The framework achieved near real-time inference speed (0.06s per frame on an RTX A5000 GPU). This AI-assisted framework improved the accuracy and robustness of scar tissue detection during LC, even in bleeding-compromised fields. Its real-time capability and strong clinical validation indicate substantial potential for intraoperative application and enhancement of surgical safety.
- Research Article
- 10.1145/3763285
- Dec 1, 2025
- ACM Transactions on Graphics
- Jiahao Ge + 4 more
This paper presents LEGO ® -Maker, a new learning-based generative model that can effectively consider over 100 unique brick types and rapidly generate hundreds of bricks to create LEGO ® models conditioned on images. This work has three major technical contributions that enable it to achieve surpassing capabilities beyond existing generative approaches. First, we design a compact LEGO ® tokenization scheme to serialize LEGO ® models and bricks into tokens for autoregressive learning. Second, we build LEGO ® -Maker, an autoregressive image-conditioned architecture, with a multi-token prediction strategy to encourage pre-considering multiple brick attributes and a rollback mechanism for collision-free generation. Third, we propose an effective data preparation pipeline with a procedural generator to synthesize LEGO ® models and a LEGO ® -to-real image translator distilled from a large vision language model to translate LEGO ® renderings into associated photorealistic images, leveraging rich prior to address the scarcity of image-to-LEGO ® data. Extensive evaluations and comparisons are conducted on two object categories, facade and portrait, over metrics in four aspects: geometry, color, semantics, and structural integrity, together with a user study. Experimental results demonstrate the versatility and compelling strengths of LEGO ® -Maker in producing structures and details given by the reference image. Also, the evaluation scores manifest that our method clearly surpasses the baselines, consistently for all evaluation metrics.