A real-time image super-resolution model based on U-shaped deep feature extraction module
A real-time image super-resolution model based on U-shaped deep feature extraction module
- Book Chapter
4
- 10.1007/978-3-031-18910-4_16
- Jan 1, 2022
Compressed sensing magnetic resonance imaging (CS-MRI) is an important and effective tool for the fast MR imaging, which enables superior performance in restoring the anatomy of patients from the undersampled k-space data. Deep learning methods have been successful in solving this inverse problem and are competent to generate high quality reconstructions. While state-of-the-art deep learning methods for MRI reconstructions are based on convolutional neural networks (CNNs), which only consider the local features independently and lack long-range dependencies presented in images. In this work, inspired by the impressive performances of Transformers on high level vision tasks, we propose a Cascade multiscale Swin-Conv (CMSC) network, a novel Swin Transformer based method for the fast MRI reconstruction. The whole network consists of the shallow feature extraction, deep feature extraction and image reconstruction module. Specifically, the shallow feature extraction module is a Multiscale Cascade Convolution Block (MCCB), and the deep feature extraction module is a U-shaped network composed of several Cascade Multiscale Swin-Conv Blocks (CMSCB), each of which has several Cascade Swin Transformer Layers (STL) and an MCCB to highly model both the local and long-range information with multiscale features. Our framework provides three appealing benefits: (i) A new U-shaped deep feature extraction module combined with Swin Transformers and CNNs is introduced to hierarchically capture the local and global information. (ii) A novel CMSCB is designed for developing multiscale features via computing self-attention in a window with increasing size. (iii) Our MCCB is the first attempt for fusing multiscale information and deeply extracting local features with multiple residual convolutional layers. Experimental results demonstrate that our model achieves superior performances compared with other state-of-the-art deep learning-based reconstruction methods.KeywordsMRI reconstructionSwin transformerMultiscale cascade convolution
- Research Article
60
- 10.1016/j.compag.2022.106736
- Mar 1, 2022
- Computers and Electronics in Agriculture
Classification of crop pests based on multi-scale feature fusion
- Research Article
2
- 10.3390/s23239575
- Dec 2, 2023
- Sensors
Traditional Convolutional Neural Network (ConvNet, CNN)-based image super-resolution (SR) methods have lower computation costs, making them more friendly for real-world scenarios. However, they suffer from lower performance. On the contrary, Vision Transformer (ViT)-based SR methods have achieved impressive performance recently, but these methods often suffer from high computation costs and model storage overhead, making them hard to meet the requirements in practical application scenarios. In practical scenarios, an SR model should reconstruct an image with high quality and fast inference. To handle this issue, we propose a novel CNN-based Efficient Residual ConvNet enhanced with structural Re-parameterization (RepECN) for a better trade-off between performance and efficiency. A stage-to-block hierarchical architecture design paradigm inspired by ViT is utilized to keep the state-of-the-art performance, while the efficiency is ensured by abandoning the time-consuming Multi-Head Self-Attention (MHSA) and by re-designing the block-level modules based on CNN. Specifically, RepECN consists of three structural modules: a shallow feature extraction module, a deep feature extraction, and an image reconstruction module. The deep feature extraction module comprises multiple ConvNet Stages (CNS), each containing 6 Re-Parameterization ConvNet Blocks (RepCNB), a head layer, and a residual connection. The RepCNB utilizes larger kernel convolutions rather than MHSA to enhance the capability of learning long-range dependence. In the image reconstruction module, an upsampling module consisting of nearest-neighbor interpolation and pixel attention is deployed to reduce parameters and maintain reconstruction performance, while bicubic interpolation on another branch allows the backbone network to focus on learning high-frequency information. The extensive experimental results on multiple public benchmarks show that our RepECN can achieve 2.5∼5× faster inference than the state-of-the-art ViT-based SR model with better or competitive super-resolving performance, indicating that our RepECN can reconstruct high-quality images with fast inference.
- Research Article
11
- 10.1016/j.sigpro.2024.109542
- May 17, 2024
- Signal Processing
Non-local sparse attention based swin transformer V2 for image super-resolution
- Research Article
12
- 10.1038/s41598-022-09685-w
- Apr 6, 2022
- Scientific Reports
Abrupt and continuous nature of scale variation in a crowded scene is a challenging task to enhance crowd counting accuracy in an image. Existing crowd counting techniques generally used multi-column or single-column dilated convolution to tackle scale variation due to perspective distortion. However, due to multi-column nature, they obtain identical features, whereas, the standard dilated convolution (SDC) with expanded receptive field size has sparse pixel sampling rate. Due to sparse nature of SDC, it is highly challenging to obtain relevant contextual information. Further, features at multiple scale are not extracted despite some inception-based model is not used (which is cost effective). To mitigate theses drawbacks in SDC, we therefore, propose a hierarchical dense dilated deep pyramid feature extraction through convolution neural network (CNN) for single image crowd counting (HDPF). It comprises of three modules: general feature extraction module (GFEM), deep pyramid feature extraction module (PFEM) and fusion module (FM). The GFEM is responsible to obtain task independent general features. Whereas, PFEM plays a vital role to obtain the relevant contextual information due to dense pixel sampling rate caused by densely connected dense stacked dilated convolutional modules (DSDCs). Further, due to dense connections among DSDCs, the final feature map acquires multi-scale information with expanded receptive field as compared to SDC. Due to dense pyramid nature, it is very effective to propagate the extracted feature from lower dilated convolutional layers (DCLs) to middle and higher DCLs, which result in better estimation accuracy. The FM is used to fuse the incoming features extracted by other modules. The proposed technique is tested through simulations on three well known datasets: Shanghaitech (Part-A), Shanghaitech (Part-B) and Venice. Results justify its relative effectiveness in terms of selected performance.
- Research Article
1
- 10.37871/jbres1572
- Oct 1, 2022
- Journal of Biomedical Research & Environmental Sciences
Early detection of breast cancer and effective identification of its correct stage remain major challenges for healthcare professionals. Testing the tumour for Oestrogen Receptor and Progesterone Receptor is a standard part of the initial evaluation of breast cancer diagnosis and treatment planning. Several expression profiling studies have illustrated that the expression of these hormone receptors is linked with diverse genetic variations, which means that several mutated genes can a affect the development and progression of breast cancer and contribute to its heterogeneity. Unfortunately, due to the high dimensionality and low sample size nature of microarray data, traditional statistical feature selection techniques fail to identify genes that could act as risk factors for breast cancer. Inspired by this, we developed a deep learning-based feature extraction module with a weight interpretation method to select a subset of robust biomarkers across three different mRNA expression data sets from The Cancer Genome Atlas program (TCGA). For a discovered feature (a gene) to be accepted for further investigation, it must have been independently selected by the weight interpretation method from each of the deep feature extraction modules (each having been trained on a different data set). The small panel of discovered biomarkers was then subsequently evaluated using a range of classifiers to ascertain their predictive ability with respect to the above hormone receptor status. We observed strong evidence that the upregulation in the expression levels of highly positively weighted genes within the deep feature selection modules and the down regulation in the expression levels of the highly negatively weighted genes both indicated the strong likelihood of a patient experiencing ER+/PR+ invasive breast cancer. In addition, we discovered a number of potentially novel biomarkers worthy of further consideration.
- Research Article
5
- 10.1088/1361-6560/ad22a1
- Feb 12, 2024
- Physics in Medicine & Biology
Object. The existing diagnostic paradigm for diabetic retinopathy (DR) greatly relies on subjective assessments by medical practitioners utilizing optical imaging, introducing susceptibility to individual interpretation. This work presents a novel system for the early detection and grading of DR, providing an automated alternative to the manual examination. Approach. First, we use advanced image preprocessing techniques, specifically contrast-limited adaptive histogram equalization and Gaussian filtering, with the goal of enhancing image quality and module learning capabilities. Second, a deep learning-based automatic detection system is developed. The system consists of a feature segmentation module, a deep learning feature extraction module, and an ensemble classification module. The feature segmentation module accomplishes vascular segmentation, the deep learning feature extraction module realizes the global feature and local feature extraction of retinopathy images, and the ensemble module performs the diagnosis and classification of DR for the extracted features. Lastly, nine performance evaluation metrics are applied to assess the quality of the model’s performance. Main results. Extensive experiments are conducted on four retinal image databases (APTOS 2019, Messidor, DDR, and EyePACS). The proposed method demonstrates promising performance in the binary and multi-classification tasks for DR, evaluated through nine indicators, including AUC and quadratic weighted Kappa score. The system shows the best performance in the comparison of three segmentation methods, two convolutional neural network architecture models, four Swin Transformer structures, and the latest literature methods. Significance. In contrast to existing methods, our system demonstrates superior performance across multiple indicators, enabling accurate screening of DR and providing valuable support to clinicians in the diagnostic process. Our automated approach minimizes the reliance on subjective assessments, contributing to more consistent and reliable DR evaluations.
- Research Article
1
- 10.1088/2631-8695/ae2ce7
- Dec 29, 2025
- Engineering Research Express
To address the issues of difficult sparsity evaluation in traditional compressive sensing and strong randomness in signal compressive sampling, an adaptive deep compressive feature extraction method based on dual-branch wavelet convolution joint sparse sensing is proposed and successfully applied to bearing vibration signals. Firstly, a feature extraction module with a dual-branch heterogeneous wavelet convolution architecture is designed to fully leverage the advantages of different-sized convolution kernels in feature extraction, capturing bearing vibration data features at multiple scales. Secondly, a deep compressive feature reduction module is developed based on the energy-preserving property of compressive sensing. An energy-and-information-entropy co-driven compression strategy is formulated to deeply optimize the reduction process, effectively improving the quality of compressive features. Then, a novel loss function with joint spatial-spectral optimization is constructed to enhance the model's ability to learn sparse features, and an Adaptive Dynamic Weight Nonlinear Transition (ADWNT) mechanism is proposed to adaptively adjust the loss function weights. Finally, the proposed method is validated using data from a thrust bearing limit state performance test bench and a rotating machinery fault simulation test bench. Experimental results show that the method can overcome the limitations of traditional compressive sensing, achieve feature extraction under strong background noise, and maintain a good balance between model complexity and performance.
- Research Article
70
- 10.1016/j.cmpb.2019.105236
- Nov 20, 2019
- Computer Methods and Programs in Biomedicine
Celiac disease diagnosis from videocapsule endoscopy images with residual learning and deep feature extraction
- Research Article
2
- 10.1016/j.jfca.2025.108506
- Dec 1, 2025
- Journal of Food Composition and Analysis
Traditional methods for assessing pork freshness, such as TVB-N and TVC measurement, are time-consuming and destructive. This study introduces a dual-branch hyperspectral feature extraction network (HybridFeatureExtractor). The designed feature extraction module consists of a spectral branch employing the SE attention mechanism, a spatial branch incorporating ASPP, and a gated fusion mechanism, effectively capturing spectral and spatial information across multiple scales. Together with machine learning regressors, the framework achieved excellent predictive performance, with R2 of 0.9786 (RMSE=2.4685) for TVB-N and R2 of 0.9597 (RMSE=0.3066) for TVC. Compared with chemometric approaches (e.g., SG+SPA, SNV+CARS), the proposed method shows superior accuracy and robustness. This hyperspectral modeling strategy may provide a robust and highly practical technical pathway for non-destructive, assessment of pork freshness. • A deep dual-branch hyperspectral feature extraction module was proposed for pork freshness detection. • Introduce the SE attention mechanism and the ASPP module to enhance the feature capture ability. • Modeling with PLSR and SVR has improved the prediction accuracy and generalization ability. • The RPD values for TVB-N and TVC reached as high as 7.1204 and 5.1831, respectively. • Ablation studies and attention analysis confirmed the model's robustness and interpretability.
- Research Article
3
- 10.32604/cmc.2023.047057
- Jan 1, 2024
- Computers, Materials & Continua
We propose a novel image segmentation algorithm to tackle the challenge of limited recognition and segmentation performance in identifying welding seam images during robotic intelligent operations. Initially, to enhance the capability of deep neural networks in extracting geometric attributes from depth images, we developed a novel deep geometric convolution operator (DGConv). DGConv is utilized to construct a deep local geometric feature extraction module, facilitating a more comprehensive exploration of the intrinsic geometric information within depth images. Secondly, we integrate the newly proposed deep geometric feature module with the Fully Convolutional Network (FCN8) to establish a high-performance deep neural network algorithm tailored for depth image segmentation. Concurrently, we enhance the FCN8 detection head by separating the segmentation and classification processes. This enhancement significantly boosts the network’s overall detection capability. Thirdly, for a comprehensive assessment of our proposed algorithm and its applicability in real-world industrial settings, we curated a line-scan image dataset featuring weld seams. This dataset, named the Standardized Linear Depth Profile (SLDP) dataset, was collected from actual industrial sites where autonomous robots are in operation. Ultimately, we conducted experiments utilizing the SLDP dataset, achieving an average accuracy of 92.7%. Our proposed approach exhibited a remarkable performance improvement over the prior method on the identical dataset. Moreover, we have successfully deployed the proposed algorithm in genuine industrial environments, fulfilling the prerequisites of unmanned robot operations.
- Research Article
- 10.1016/j.image.2024.117148
- May 18, 2024
- Signal Processing: Image Communication
HorSR: High-order spatial interactions and residual global filter for efficient image super-resolution
- Research Article
50
- 10.3390/jcm9124013
- Dec 11, 2020
- Journal of Clinical Medicine
The differentiation of autoimmune pancreatitis (AIP) and pancreatic ductal adenocarcinoma (PDAC) poses a relevant diagnostic challenge and can lead to misdiagnosis and consequently poor patient outcome. Recent studies have shown that radiomics-based models can achieve high sensitivity and specificity in predicting both entities. However, radiomic features can only capture low level representations of the input image. In contrast, convolutional neural networks (CNNs) can learn and extract more complex representations which have been used for image classification to great success. In our retrospective observational study, we performed a deep learning-based feature extraction using CT-scans of both entities and compared the predictive value against traditional radiomic features. In total, 86 patients, 44 with AIP and 42 with PDACs, were analyzed. Whole pancreas segmentation was automatically performed on CT-scans during the portal venous phase. The segmentation masks were manually checked and corrected if necessary. In total, 1411 radiomic features were extracted using PyRadiomics and 256 features (deep features) were extracted using an intermediate layer of a convolutional neural network (CNN). After feature selection and normalization, an extremely randomized trees algorithm was trained and tested using a two-fold shuffle-split cross-validation with a test sample of 20% (n = 18) to discriminate between AIP or PDAC. Feature maps were plotted and visual difference was noted. The machine learning (ML) model achieved a sensitivity, specificity, and ROC-AUC of 0.89 ± 0.11, 0.83 ± 0.06, and 0.90 ± 0.02 for the deep features and 0.72 ± 0.11, 0.78 ± 0.06, and 0.80 ± 0.01 for the radiomic features. Visualization of feature maps indicated different activation patterns for AIP and PDAC. We successfully trained a machine learning model using deep feature extraction from CT-images to differentiate between AIP and PDAC. In comparison to traditional radiomic features, deep features achieved a higher sensitivity, specificity, and ROC-AUC. Visualization of deep features could further improve the diagnostic accuracy of non-invasive differentiation of AIP and PDAC.
- Research Article
8
- 10.1049/ipr2.12467
- Mar 13, 2022
- IET Image Processing
Aiming at the problems of incomplete dehazing of a single image and unnaturalness of the restored image, a multi‐scale single‐image defogging network with local features fused with global features is proposed, using fog and non‐fogging image pairs train the network in a direct end‐to‐end manner. The network is divided into global feature extraction module, multi‐scale feature extraction module and deep fusion module. The global feature extraction module extracts global features that characterize the contour; multi‐scale feature extraction module extracts features at different scales to improve learning accuracy; in the deep fusion module, the convolutional layer extracts the local features that describe the image content, and then the local features and the global features are merged through skip connections. Comparative experiments were carried out on artificially synthesized fog images and real fog images. The experimental results show that the algorithm proposed here can achieve the ideal dehazing effect, and is superior to other comparison algorithms in subjective and objective aspects.
- Research Article
3
- 10.3390/ijgi12070258
- Jun 27, 2023
- ISPRS International Journal of Geo-Information
The purpose of multisource map super-resolution is to reconstruct high-resolution maps based on low-resolution maps, which is valuable for content-based map tasks such as map recognition and classification. However, there is no specific super-resolution method for maps, and the existing image super-resolution methods often suffer from missing details when reconstructing maps. We propose a map super-resolution (mapSR) model that fuses local and global features for super-resolution reconstruction of low-resolution maps. Specifically, the proposed model consists of three main modules: a shallow feature extraction module, a deep feature fusion module, and a map reconstruction module. First, the shallow feature extraction module initially extracts the image features and embeds the images with appropriate dimensions. The deep feature fusion module uses Transformer and Convolutional Neural Network (CNN) to focus on extracting global and local features, respectively, and fuses them by weighted summation. Finally, the map reconstruction module uses upsampling methods to reconstruct the map features into the high-resolution map. We constructed a high-resolution map dataset for training and validating the map super-resolution model. Compared with other models, the proposed method achieved the best results in map super-resolution.