- New
- Research Article
- 10.3390/jimaging12010029
- Jan 6, 2026
- Journal of Imaging
- Jincheng Li + 4 more
Depression is a prevalent mental disorder that imposes a significant public health burden worldwide. Although multimodal detection methods have shown potential, existing techniques still face two critical bottlenecks: (i) insufficient integration of global patterns and local fluctuations in long-sequence modeling and (ii) static fusion strategies that fail to dynamically adapt to the complementarity and redundancy among modalities. To address these challenges, this paper proposes a dynamic multimodal depression detection framework, DynMultiDep, which combines multi-scale temporal modeling with an adaptive fusion mechanism. The core innovations of DynMultiDep lie in its Multi-scale Temporal Experts Module (MTEM) and Dynamic Multimodal Fusion module (DynMM). On one hand, MTEM employs Mamba experts to extract long-term trend features and utilizes local-window Transformers to capture short-term dynamic fluctuations, achieving adaptive fusion through a long-short routing mechanism. On the other hand, DynMM introduces modality-level and fusion-level dynamic decision-making, selecting critical modality paths and optimizing cross-modal interaction strategies based on input characteristics. The experimental results demonstrate that DynMultiDep outperforms existing state-of-the-art methods in detection performance on two widely used large-scale depression datasets.
- New
- Research Article
- 10.3390/jimaging12010028
- Jan 6, 2026
- Journal of Imaging
- Sam Sedaghat + 8 more
This study aims to demonstrate the feasibility of ultrashort echo time (UTE)-based susceptibility source separation for musculoskeletal (MSK) imaging, enabling discrimination between diamagnetic and paramagnetic tissue components, with a particular focus on hemophilic arthropathy (HA). Three key techniques were integrated to achieve UTE-based susceptibility source separation: Iterative decomposition of water and fat with echo asymmetry and least-squares estimation for B0 field estimation, projection onto dipole fields for local field mapping, and χ-separation for quantitative susceptibility mapping (QSM) with source decomposition. A phantom containing varying concentrations of diamagnetic (CaCO3) and paramagnetic (Fe3O4) materials was used to validate the method. In addition, in vivo UTE-QSM scans of the knees and ankles were performed on five HA patients using a 3T clinical MRI scanner. In the phantom, conventional QSM underestimated susceptibility values due to the mixed-source cancelling the effect. In contrast, source-separated maps provided distinct diamagnetic and paramagnetic susceptibility values that correlated strongly with CaCO3 and Fe3O4 concentrations (r = −0.99 and 0.95, p < 0.05). In vivo, paramagnetic maps enabled improved visualization of hemosiderin deposits in joints of HA patients, which were poorly visualized or obscured in conventional QSM due to susceptibility cancellation by surrounding diamagnetic tissues such as bone. This study demonstrates, for the first time, the feasibility of UTE-based quantitative susceptibility source separation for MSK applications. The approach enhances the detection of paramagnetic substances like hemosiderin in HA and offers potential for improved assessment of bone and joint tissue composition.
- New
- Research Article
- 10.3390/jimaging12010019
- Dec 31, 2025
- Journal of Imaging
- Zhongmin Jiang + 3 more
Existing methods for reconstructing hyperspectral images from single RGB images struggle to obtain a large number of labeled RGB-HSI paired images. These methods face issues such as detail loss, insufficient robustness, low reconstruction accuracy, and the difficulty of balancing the spatial–spectral trade-off. To address these challenges, a Double-Gated Mamba Multi-Scale Adaptive Feature (DMMAF) learning network model is proposed. DMMAF designs a reflection dot-product adaptive dual-noise-aware feature extraction method, which is used to supplement edge detail information in spectral images and improve robustness. DMMAF also constructs a deformable attention-based global feature extraction method and a double-gated Mamba local feature extraction approach, enhancing the interaction between local and global information during the reconstruction process, thereby improving image accuracy. Meanwhile, DMMAF introduces a structure-aware smooth loss function, which, by combining smoothing, curvature, and attention supervision losses, effectively resolves the spatial–spectral resolution balance problem. This network model is applied to three datasets—NTIRE 2020, Harvard, and CAVE—achieving state-of-the-art unsupervised reconstruction performance compared to existing advanced algorithms. Experiments on the NTIRE 2020, Harvard, and CAVE datasets demonstrate that this model achieves state-of-the-art unsupervised reconstruction performance. On the NTIRE 2020 dataset, our method attains MRAE, RMSE, and PSNR values of 0.133, 0.040, and 31.314, respectively. On the Harvard dataset, it achieves RMSE and PSNR values of 0.025 and 34.955, respectively, while on the CAVE dataset, it achieves RMSE and PSNR values of 0.041 and 30.983, respectively.
- New
- Research Article
- 10.3390/jimaging12010017
- Dec 30, 2025
- Journal of Imaging
- Ichrak Khoulqi + 1 more
In this paper, we propose a literature review regarding two deep learning architectures, namely Convolutional Neural Networks (CNNs) and Capsule Networks (CapsNets), applied to medical images, in order to analyze them to help in medical decision support. CNNs demonstrate their capacity in the medical diagnostic field; however, their reliability decreases when there is slight spatial variability, which can affect diagnosis, especially since the anatomical structure of the human body can differ from one patient to another. In contrast, CapsNets encode not only feature activation but also spatial relationships, hence improving the reliability and stability of model generalization. This paper proposes a structured comparison by reviewing studies published from 2018 to 2025 across major databases, including IEEE Xplore, ScienceDirect, SpringerLink, and MDPI. The applications in the reviewed papers are based on the benchmark datasets BraTS, INbreast, ISIC, and COVIDx. This paper review compares the core architectural principles, performance, and interpretability of both architectures. To conclude the paper, we underline the complementary roles of these two architectures in medical decision-making and propose future directions toward hybrid, explainable, and computationally efficient deep learning systems for real clinical environments, thereby increasing survival rates by helping prevent diseases at an early stage.
- New
- Research Article
- 10.3390/jimaging12010018
- Dec 30, 2025
- Journal of Imaging
- Ali Awad + 5 more
Underwater images often suffer from severe color distortion, low contrast, and reduced visibility, motivating the widespread use of image enhancement as a preprocessing step for downstream computer vision tasks. However, recent studies have questioned whether enhancement actually improves object detection performance. In this work, we conduct a comprehensive and rigorous evaluation of nine state-of-the-art enhancement methods and their interactions with modern object detectors. We propose a unified evaluation framework that integrates (1) a distribution-level quality assessment using a composite quality index (Q-index), (2) a fine-grained per-image detection protocol based on COCO-style mAP, and (3) a mixed-set upper-bound analysis that quantifies the theoretical performance achievable through ideal selective enhancement. Our findings reveal that traditional image quality metrics do not reliably predict detection performance, and that dataset-level conclusions often overlook substantial image-level variability. Through per-image evaluation, we identify numerous cases in which enhancement significantly improves detection accuracy—primarily for low-quality inputs—while also demonstrating conditions under which enhancement degrades performance. The mixed-set analysis shows that selective enhancement can yield substantial gains over both original and fully enhanced datasets, establishing a new direction for designing enhancement models optimized for downstream vision tasks. This study provides the most comprehensive evidence to date that underwater image enhancement can be beneficial for object detection when evaluated at the appropriate granularity and guided by informed selection strategies. The data generated and code developed are publicly available.
- New
- Research Article
- 10.3390/jimaging12010015
- Dec 29, 2025
- Journal of Imaging
- Zhigao Zeng + 4 more
Medical image segmentation presents substantial challenges arising from the diverse scales and morphological complexities of target anatomical structures. Although existing Transformer-based models excel at capturing global dependencies, they encounter critical bottlenecks in multi-scale feature representation, spatial relationship modeling, and cross-layer feature fusion. To address these limitations, we propose the M3-TransUNet architecture, which incorporates three key innovations: (1) MSGA (Multi-Scale Gate Attention) and MSSA (Multi-Scale Selective Attention) modules to enhance multi-scale feature representation; (2) ME-MSA (Manhattan Enhanced Multi-Head Self-Attention) to integrate spatial priors into self-attention computations, thereby overcoming spatial modeling deficiencies; and (3) MKGAG (Multi-kernel Gated Attention Gate) to optimize skip connections by precisely filtering noise and preserving boundary details. Extensive experiments on public datasets—including Synapse, CVC-ClinicDB, and ISIC—demonstrate that M3-TransUNet achieves state-of-the-art performance. Specifically, on the Synapse dataset, our model outperforms recent TransUNet variants such as J-CAPA, improving the average DSC to 82.79% (compared to 82.29%) and significantly reducing the average HD95 from 19.74 mm to 10.21 mm.
- New
- Research Article
- 10.3390/jimaging12010016
- Dec 29, 2025
- Journal of Imaging
- Lin Shi + 4 more
Synthesizing photo-realistic images of a scene from arbitrary viewpoints and under arbitrary lighting environments is one of the important research topics in computer vision and graphics. In this paper, we propose a method for synthesizing photo-realistic images of a scene with fluorescent objects from novel viewpoints and under novel lighting colors and spectra. In general, fluorescent materials absorb light with certain wavelengths and then emit light with longer wavelengths than the absorbed ones, in contrast to reflective materials, which preserve wavelengths of light. Therefore, we cannot reproduce the colors of fluorescent objects under arbitrary lighting colors by combining conventional view synthesis techniques with the white balance adjustment of the RGB channels. Accordingly, we extend the novel-view synthesis based on the neural radiance fields by incorporating the superposition principle of light; our proposed method captures a sparse set of images of a scene from varying viewpoints and under varying lighting colors or spectra with active lighting systems such as a color display or a multi-spectral light stage and then synthesizes photo-realistic images of the scene without explicitly modeling its geometric and photometric models. We conducted a number of experiments using real images captured with an LCD and confirmed that our method works better than the existing methods. Moreover, we showed that the extension of our method using more than three primary colors with a light stage enables us to reproduce the colors of fluorescent objects under common light sources.
- New
- Research Article
- 10.3390/jimaging12010010
- Dec 25, 2025
- Journal of Imaging
- Haya Monawwar + 1 more
Accurate six-degree-of-freedom (6-DoF) camera pose estimation is essential for augmented reality, robotics navigation, and indoor mapping. Existing pipelines often depend on detailed floorplans, strict Manhattan-world priors, and dense structural annotations, which lead to failures in ambiguous room layouts where multiple rooms appear in a query image and their boundaries may overlap or be partially occluded. We present Render-Rank-Refine, a two-stage framework operating on coarse semantic meshes without requiring textured models or per-scene fine-tuning. First, panoramas rendered from the mesh enable global retrieval of coarse pose hypotheses. Then, perspective views from the top-k candidates are compared to the query via rotation-invariant circular descriptors, which re-ranks the matches before final translation and rotation refinement. Our method increases camera localization accuracy compared to the state-of-the-art SPVLoc baseline by reducing the translation error by 40.4% and the rotation error by 29.7% in ambiguous layouts, as evaluated on the Zillow Indoor Dataset. In terms of inference throughput, our method achieves 25.8–26.4 QPS, (Queries Per Second) which is significantly faster than other recent comparable methods, while maintaining accuracy comparable to or better than the SPVLoc baseline. These results demonstrate robust, near-real-time indoor localization that overcomes structural ambiguities and heavy geometric assumptions.
- New
- Research Article
- 10.3390/jimaging12010005
- Dec 24, 2025
- Journal of Imaging
- Fumi Mizuhashi + 8 more
We aimed to investigate the type of bone changes in temporomandibular disorder patients with disc displacement. The subjects were 117 temporomandibular joints that were diagnosed with anterior disc displacement using magnetic resonance imaging (MRI). Temporomandibular joint (TMJ) pain and opening dysfunction were examined. Disc displacement with and without reduction, joint effusion, and bone changes in the mandibular condyle were assessed on MRI. The types of bone changes were classified into erosion, flattening, osteophyte, and atrophy on the MR images. Fisher’s exact test and χ2 test were performed for analyses. Bone changes were found on 30.8% of subjects with erosion, flattening, osteophyte, and atrophy types (p < 0.001). The occurrence of joint effusion appearance (p < 0.001), TMJ pain (p = 0.027), and opening dysfunction (p = 0.002) differed among the types of bone changes. Gender differences were also found among the types of bone changes (p < 0.001). The rate of disc displacement with reduction was significantly smaller than that of disc displacement without reduction on flattening and osteophyte (p < 0.001). The results made it clear that the symptoms, gender, and presence or absence of disc reduction differed among the types of bone changes.
- New
- Research Article
- 10.3390/jimaging12010004
- Dec 22, 2025
- Journal of Imaging
- Mou Deb + 18 more
This study proposes a novel two-dimensional Empirical Mode Decomposition (2D EMD)-based deep learning framework to enhance model performance in multi-class image classification tasks and potential early detection of diseases in healthcare using medical imaging. To validate this approach, we apply it to gastrointestinal (GI) endoscopic image classification using the publicly available Kvasir dataset, which contains eight GI image classes with 1000 images each. The proposed 2D EMD-based design procedure decomposes images into a full set of intrinsic mode functions (IMFs) to enhance image features beneficial for AI model development. Integrating 2D EMD into a deep learning pipeline, we evaluate its impact on four popular models (ResNet152, VGG19bn, MobileNetV3L, and SwinTransformerV2S). The results demonstrate that subtracting IMFs from the original image consistently improves accuracy, F1-score, and AUC for all models. The study reveals a notable enhancement in model performance, with an approximately 9% increase in accuracy compared to counterparts without EMD integration for ResNet152. Similarly, there is an increase of around 18% for VGG19L, 3% for MobileNetV3L, and 8% for SwinTransformerV2. Additionally, explainable AI (XAI) techniques, such as Grad-CAM, illustrate that the model focuses on GI regions for predictions. This study highlights the efficacy of 2D EMD in enhancing deep learning model performance for GI image classification, with potential applications in other domains.