MBLKNet: a large kernel convolution-driven network with multi-task self-supervised learning for SAR maritime target classification

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

ABSTRACT Synthetic aperture radar (SAR) maritime target classification serves as a critical component in modern maritime surveillance. While deep learning networks, particularly convolutional neural networks (CNNs), have driven substantial progress in this domain, three key challenges constrain their performance and practical deployment: 1) In SAR maritime images, complex inshore backgrounds and speckle noise are prevalent. Targets such as ships span a wide range of scales due to different imaging resolutions and intrinsic size variability, exacerbating inter-class similarity and intra-class variability, 2) Labeled data for SAR maritime target classification are scarce, and sensor imaging modes differ markedly across platforms, and 3) Existing CNNs that fuse traditional hand-crafted features often explicitly treat hand-crafted feature extraction as a necessary component of the network and primarily focus on classification performance, overlooking the requirement to efficiently leverage their feature extraction capabilities in downstream tasks. To overcome these challenges, this article proposes a novel SAR maritime target classification network (MBLKNet) based on large kernel convolution and multi-task self-supervised learning. In MBLKNet, four improved designs for network structure are proposed to enhance classification accuracy: 1) macro design, 2) multi-branch large kernel convolution module (MBLKCM), 3) lightweight channel-interactive multi-layer perceptron (LCIMLP), and 4) micro design. In addition, a multi-resolution unlabeled SAR maritime target dataset (SL-SARShip) and a masked image modeling framework, HOGSparK, are proposed to enable the pre-training of MBLKNet under joint supervision of pixel and HOG features. Comparison results on OpenSARShip 2.0 and FUSAR-Ship with state-of-the-art networks, as well as experiments on SSDD for SAR downstream target detection and instance segmentation, demonstrate that the proposed MBLKNet achieves superior performance and strong feature extraction ability.

Similar Papers
  • Research Article
  • Cite Count Icon 13
  • 10.1080/01431161.2020.1766149
A semi-greedy neural network CAE-HL-CNN for SAR target recognition with limited training data
  • Aug 15, 2020
  • International Journal of Remote Sensing
  • Rui Qin + 3 more

Synthetic aperture radar (SAR) automatic target recognition (ATR) based on convolutional neural network (CNN) is a research hotspot in recent years. However, CNN is data-driven, and severe overfitting occurs when training data is scarce. To solve this problem, we first introduce a non-greedy CNN network. But when a CNN structure with a non-greedy classifier is used to handle the SAR ATR in the case of scarce training data, the feature extraction capability of the network degrades. To balance the feature extraction and anti-overfitting capabilities of the network, a semi-greedy network called transfer learning with convolutional auto-encoders (CAE) and hinge loss CNN (HL-CNN), namely CAE-HL-CNN, is proposed in this paper. First, the CAE-HL-CNN introduces a non-greedy network which uses a hinge loss classifier in the CNN structure to enhance the network’s generalization performance. It retains the hierarchical feature extraction structure of CNN and has the same anti-overfitting capability as support vector machine. Then, by combining CAE with the HL-CNN through transfer learning, the CAE-HL-CNN extracts a complete feature representation to compensate for the degradation in feature extraction capability in a greedy way. Experiments on the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset show that in the case of scarce training data, the proposed network can improve the recognition performance of CNN, which achieves higher classification accuracy and performs more equably on each category, and it extracts sparser feature maps than the compared methods.

  • Research Article
  • Cite Count Icon 8
  • 10.3390/rs14235986
Ship Classification in SAR Imagery by Shallow CNN Pre-Trained on Task-Specific Dataset with Feature Refinement
  • Nov 25, 2022
  • Remote Sensing
  • Haitao Lang + 4 more

Ship classification based on high-resolution synthetic aperture radar (SAR) imagery plays an increasingly important role in various maritime affairs, such as marine transportation management, maritime emergency rescue, marine pollution prevention and control, marine security situational awareness, and so on. The technology of deep learning, especially convolution neural network (CNN), has shown excellent performance on ship classification in SAR images. Nevertheless, it still has some limitations in real-world applications that need to be taken seriously by researchers. One is the insufficient number of SAR ship training samples, which limits the learning of satisfactory CNN, and the other is the limited information that SAR images can provide (compared with natural images), which limits the extraction of discriminative features. To alleviate the limitation caused by insufficient training datasets, one of the widely adopted strategies is to pre-train CNNs on a generic dataset with massive labeled samples (such as ImageNet) and fine-tune the pre-trained network on the target dataset (i.e., a SAR dataset) with a small number of training samples. However, recent studies have shown that due to the different imaging mechanisms between SAR and natural images, it is hard to guarantee that the pre-trained CNNs (even if they perform extremely well on ImageNet) can be finely tuned by a SAR dataset. On the other hand, to extract the most discriminative ship representation features from SAR images, the existing methods have carried out fruitful research on network architecture design, attention mechanism embedding, feature fusion, etc. Although these efforts improve the performance of SAR ship classification to some extent, they are usually based on more complex network architecture and higher dimensional features, accompanied by more time-consuming storage expenses. Through the analysis of SAR image characteristics and CNN feature extraction mechanism, this study puts forward three hypotheses: (1) Pre-training CNN on a task-specific dataset may be more effective than that on a generic dataset; (2) a shallow CNN may be more suitable for SAR image feature extraction than a deep one; and (3) the deep features extracted by CNNs can be further refined to improve the feature discrimination ability. To validate these hypotheses, we propose to learn a shallow CNN which is pre-trained on a task-specific dataset, i.e., the optical remote sensing ship dataset (ORS) instead of on the widely adopted ImageNet dataset. For comparison purposes, we designed 28 CNN architectures by changing the arrangement of the CNN components, the size of convolutional filters, and pooling formulations based on VGGNet models. To further reduce redundancy and improve the discrimination ability of the deep features, we propose to refine deep features by active convolutional filter selection based on the coefficient of variation (COV) sorting criteria. Extensive experiments not only prove that the above hypotheses are valid but also prove that the shallow network learned by the proposed pre-training strategy and the feature refining method can achieve considerable ship classification performance in SAR images like the state-of-the-art (SOTA) methods.

  • Research Article
  • Cite Count Icon 593
  • 10.1109/access.2020.3005861
HRSID: A High-Resolution SAR Images Dataset for Ship Detection and Instance Segmentation
  • Jan 1, 2020
  • IEEE Access
  • Shunjun Wei + 5 more

With the development of satellite technology, up to date imaging mode of synthetic aperture radar (SAR) satellite can provide higher resolution SAR imageries, which benefits ship detection and instance segmentation. Meanwhile, object detectors based on convolutional neural network (CNN) show high performance on SAR ship detection even without land-ocean segmentation; but with respective shortcomings, such as the relatively small size of SAR images for ship detection, limited SAR training samples, and inappropriate annotations, in existing SAR ship datasets, related research is hampered. To promote the development of CNN based ship detection and instance segmentation, we have constructed a High-Resolution SAR Images Dataset (HRSID). In addition to object detection, instance segmentation can also be implemented on HRSID. As for dataset construction, under the overlapped ratio of 25%, 136 panoramic SAR imageries with ranging resolution from 1m to 5m are cropped to $800 \times 800$ pixels SAR images. To reduce wrong annotation and missing annotation, optical remote sensing imageries are applied to reduce the interferes from harbor constructions. There are 5604 cropped SAR images and 16951 ships in HRSID, and we have divided HRSID into a training set (65% SAR images) and test set (35% SAR images) with the format of Microsoft Common Objects in Context (MS COCO). 8 state-of-the-art detectors are experimented on HRSID to build the baseline; MS COCO evaluation metrics are applicated for comprehensive evaluation. Experimental results reveal that ship detection and instance segmentation can be well implemented on HRSID.

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/ictc49638.2020.9123301
Analysis of Detection Preference to CNN Based SAR Ship Detectors
  • May 1, 2020
  • Long Han + 3 more

In recent years, convolutional neural network (CNN) has been studied extensively in synthetic aperture radar (SAR) ship detection for its powerful feature extraction and classification capability, and has significantly improved the accuracy and robustness in condition of large scenes, multi-resolutions, and complex backgrounds. At present, a lot of related research focuses on design more efficient net structures. There have not been studies on how the detection performance varies for images with different complexity, backgrounds, surroundings, and quality. Taking two open-access SAR datasets, i.e. SSDD and UCAS_SARShip, as our source datasets, this paper first divides all the images into four classes according to their surroundings and backgrounds, then, dividing each class of images to training subset, validating subset, and test subset based on their features (image quality, noise level, complexity, etc.). Secondly, a large number of SAR ship detection experiments are conducted with five representative CNN based detectors on the four datasets we made. The experimental results show that detection preference is serious for each of the five detectors when they were trained with one of the four datasets. The research in this paper is a benefit for peers to understand and analyze detection preference, besides provide some valuable reference for the collection and division of SAR datasets for ship detection in the following research.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 11
  • 10.3390/rs11080906
D-ATR for SAR Images Based on Deep Neural Networks
  • Apr 13, 2019
  • Remote Sensing
  • Zongyong Cui + 3 more

Automatic target recognition (ATR) can obtain important information for target surveillance from Synthetic Aperture Radar (SAR) images. Thus, a direct automatic target recognition (D-ATR) method, based on a deep neural network (DNN), is proposed in this paper. To recognize targets in large-scene SAR images, the traditional methods of SAR ATR are comprised of four major steps: detection, discrimination, feature extraction, and classification. However, the recognition performance is sensitive to each step, as the processing result from each step will affect the following step. Meanwhile, these processes are independent, which means that there is still room for processing speed improvement. The proposed D-ATR method can integrate these steps as a whole system and directly recognize targets in large-scene SAR images, by encapsulating all of the computation in a single deep convolutional neural network (DCNN). Before the DCNN, a fast sliding method is proposed to partition the large image into sub-images, to avoid information loss when resizing the input images, and to avoid the target being divided into several parts. After the DCNN, non-maximum suppression between sub-images (NMSS) is performed on the results of the sub-images, to obtain an accurate result of the large-scene SAR image. Experiments on the MSTAR dataset and large-scene SAR images (with resolution 1478 × 1784) show that the proposed method can obtain a high accuracy and fast processing speed, and out-performs other methods, such as CFAR+SVM, Region-based CNN, and YOLOv2.

  • Research Article
  • Cite Count Icon 3
  • 10.1002/mp.17546
Cross-shaped windows transformer with self-supervised pretraining for clinically significant prostate cancer detection in bi-parametric MRI.
  • Nov 26, 2024
  • Medical physics
  • Yuheng Li + 10 more

Bi-parametric magnetic resonance imaging (bpMRI) has demonstrated promising results in prostate cancer (PCa) detection. Vision transformers have achieved competitive performance compared to convolutional neural network (CNN) in deep learning, but they need abundant annotated data for training. Self-supervised learning can effectively leverage unlabeled data to extract useful semantic representations without annotation and its associated costs. This study proposes a novel self-supervised learning framework and a transformer model to enhance PCa detection using prostate bpMRI. We introduce a novel end-to-end Cross-Shaped windows (CSwin) transformer UNet model, CSwin UNet, to detect clinically significant prostate cancer (csPCa) in prostate bpMRI. We also propose a multitask self-supervised learning framework to leverage unlabeled data and improve network generalizability. Using a large prostate bpMRI dataset (PI-CAI) with 1476 patients, we first pretrain CSwin transformer using multitask self-supervised learning to improve data-efficiency and network generalizability. We then finetune using lesion annotations to perform csPCa detection. We also test the network generalization using a separate bpMRI dataset with 158 patients (Prostate158). Five-fold cross validation shows that self-supervised CSwin UNet achieves 0.888±0.010 aread under receiver operating characterstics curve (AUC) and 0.545±0.060 Average Precision (AP) on PI-CAI dataset, significantly outperforming four comparable models (nnFormer, Swin UNETR, DynUNet, Attention UNet, UNet). On model generalizability, self-supervised CSwin UNet achieves 0.79 AUC and 0.45 AP, still outperforming all other comparable methods and demonstrating good generalization to external data. This study proposes CSwin UNet, a new transformer-based model for end-to-end detection of csPCa, enhanced by self-supervised pretraining to enhance network generalizability. We employ an automatic weighted loss (AWL) to unify pretext tasks, improving representation learning. Evaluated on two multi-institutional public datasets, our method surpasses existing methods in detection metrics and demonstrates good generalization to external data.

  • Research Article
  • Cite Count Icon 144
  • 10.1109/joe.2017.2767106
Ship Classification in TerraSAR-X Images With Convolutional Neural Networks
  • Jan 1, 2018
  • IEEE Journal of Oceanic Engineering
  • Carlos Bentes + 2 more

Synthetic aperture radar (SAR) is an important instrument for oceanographic observations, providing detailed information of oceans’ surface and artificial floating structures. Due to advances in SAR technology and deployment of new SAR satellites, an increasing amount of data is available, and the development of efficient classification systems based on deep learning is possible. A deep neural network has improved the state of the art in classification tasks of optical images, but its use in SAR classification problems has been less exploited. In this paper, a full workflow for SAR maritime targets detection and classification on TerraSAR-X high-resolution image is presented, and convolutional neural networks (CNNs) recently proposed in the literature are cross evaluated on a common data set composed of five maritime classes, namely, cargo, tanker, windmill, platform, and harbor structure. Based on experiments and tests, a multiple input resolution CNN model is proposed and its performance is evaluated. Our results indicate that CNNs are efficient models to perform maritime target classification in SAR images, and the combination of different input resolutions in the CNN model improves its ability to derive features, increasing the overall classification score.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/radar53847.2021.10028013
Precise Instance Segmentation Network for High-Resolution SAR Images
  • Dec 15, 2021
  • Xiangfeng Zeng + 4 more

With the development of synthetic aperture radar (SAR) system technology and the wide application of deep learning technology, ship detection on SAR images has rapidly developed. Benefit from the strong generalization ability and end-to-end training capabilities, convolution neural network (CNN) based ship detection methods have the proprietary advantage in high-performance SAR ship detection. However, relevant SAR ship detection methods adopt rectangular bounding boxes to locate the ships which are unable to extract the contours feature of the ships. To solve this problem, we proposed a precise instance segmentation network for high-resolution SAR images. The method combines the bottom-up path augmentation module, global context module, and soft non-maximum suppression to improve Mask R-CNN for segmenting high-resolution SAR ships in pixel-wise. The network is trained and tested on the high-resolution SAR images dataset (HRSID), and the ablation experiments are conducted with Microsoft Common Objects in Context (MS COCO) evaluation metrics to verify the effects of each module. Quantitatively, the experimental results show that the method exceeds vanilla Mask R-CNN 2% AP in instance segmentation of high-resolution SAR images. Meanwhile, the visualized instance segmentation results indicate that our method fits the practical application, and it possesses the ability to extract the contour of the ship, which is more conducive to the instance segmentation of SAR images.

  • Research Article
  • Cite Count Icon 40
  • 10.1109/tgrs.2021.3066432
MAP-Net: SAR and Optical Image Matching via Image-Based Convolutional Network With Attention Mechanism and Spatial Pyramid Aggregated Pooling
  • Jan 1, 2022
  • IEEE Transactions on Geoscience and Remote Sensing
  • Song Cui + 4 more

The complementarity of synthetic aperture radar (SAR) and optical images allows remote sensing observations to “see” unprecedented discoveries. Image matching plays a fundamental role in the fusion and application of SAR and optical images. However, both the geometric imaging pattern and the physical radiation mechanism of these two sensors are significantly different, so that the images show complex geometric distortion and nonlinear radiation differences. This phenomenon brings great challenges to image matching, which neither the handcrafted descriptors nor the deep learning-based methods have adequately addressed. In this article, a novel image-based matching method for SAR to optical images via an image-based convolutional network with spatial pyramid aggregated pooling (SPAP) and an attention mechanism is proposed, namely MAP-Net. The original image is embedded through the convolutional neural network to generate the feature map. Through the information extraction and abstraction of the original imagery, the embedded features containing the high-level semantic information are more robust to the geometric distortion and radiation variation among the different modal images, which is beneficial to the matching of cross-modal images. The adoption of the SPAP module makes the network more capable of integrating global and local contextual information. The attention block weights the dense features generated from the network to extract the key features that are invariant, distinguishable, repeatable, and suitable for the image matching task. In the experiments, five sets of multisource and multiresolution SAR and optical images with wide and varied ground coverage were used to evaluate the accuracy of MAP-Net, compared to both handcrafted and deep learning-based methods. The experimental results show that the MAP-Net method is superior to the current state-of-the-art image matching methods for SAR to optical images.

  • Conference Article
  • Cite Count Icon 6
  • 10.1109/apsar46974.2019.9048264
A Maritime Target Detector Based on CNN and Embedded Device for GF-3 Images
  • Nov 1, 2019
  • Chen Zhao + 3 more

Recently, with the development of deep learning and the springing up of synthetic aperture radar (SAR) images, SAR maritime target detection based on convolutional neural network (CNN) has become a hot issue. However, most related work is realized on general purpose hardware like CPU or GPU, which is energy consuming, non-real-time and unable to be deployed on embedded devices. Aiming at this problem, this paper proposes a method to deploy a model of SAR maritime target detection network on an embedded device which employs custom artificial intelligence streaming architecture (CAISA). Moreover, the model is trained and tested on the Gaofen-3 (GF-3) spaceborne SAR images, which include six different kinds of maritime targets. Experiments based on the GF-3 dataset show the method is practicable and extensible.

  • Research Article
  • Cite Count Icon 4
  • 10.3390/rs11030282
Statistics Learning Network Based on the Quadratic Form for SAR Image Classification
  • Feb 1, 2019
  • Remote Sensing
  • Chu He + 4 more

The convolutional neural network (CNN) has shown great potential in many fields; however, transferring this potential to synthetic aperture radar (SAR) image interpretation is still a challenging task. The coherent imaging mechanism causes the SAR signal to present strong fluctuations, and this randomness property calls for many degrees of freedom (DoFs) for the SAR image description. In this paper, a statistics learning network (SLN) based on the quadratic form is presented. The statistical features are expected to be fitted in the SLN for SAR image representation. (i) Relying on the quadratic form in linear algebra theory, a quadratic primitive is developed to comprehensively learn the elementary statistical features. This primitive is an extension to the convolutional primitive that involves both nonlinear and linear transformations and provides more flexibility in feature extraction. (ii) With the aid of this quadratic primitive, the SLN is proposed for the classification task. In the SLN, different types of statistics of SAR images are automatically extracted for representation. Experimental results on three datasets show that the SLN outperforms a standard CNN and traditional texture-based methods and has potential for SAR image classification.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 4
  • 10.3390/rs16060940
SAR-CDSS: A Semi-Supervised Cross-Domain Object Detection from Optical to SAR Domain
  • Mar 7, 2024
  • Remote Sensing
  • Cheng Luo + 6 more

The unique imaging modality of synthetic aperture radar (SAR) has posed significant challenges for object detection, making it more complex to acquire and interpret than optical images. Recently, numerous studies have proposed cross-domain adaptive methods based on convolutional neural networks (CNNs) to promote SAR object detection using optical data. However, existing cross-domain methods focus on image features, lack improvement on input data, and ignore the valuable supervision provided by few labeled SAR images. Therefore, we propose a semi-supervised cross-domain object detection framework that uses optical data and few SAR data to achieve knowledge transfer for SAR object detection. Our method focuses on the data processing aspects to gradually reduce the domain shift at the image, instance, and feature levels. First, we propose a data augmentation method of image mixing and instance swapping to generate a mixed domain that is more similar to the SAR domain. This method fully utilizes few SAR annotation information to reduce domain shift at image and instance levels. Second, at the feature level, we propose an adaptive optimization strategy to filter out mixed domain samples that significantly deviate from the SAR feature distribution to train feature extractor. In addition, we employ Vision Transformer (ViT) as feature extractor to handle the global feature extraction of mixed images. We propose a detection head based on normalized Wasserstein distance (NWD) to enhance objects with smaller effective regions in SAR images. The effectiveness of our proposed method is evaluated on public SAR ship and oil tank datasets.

  • Conference Article
  • Cite Count Icon 1
  • 10.23919/elmar.2017.8124483
Deep convolutional neural networks for SAR patch categorization
  • Sep 1, 2017
  • Dusan Gleich + 3 more

The categorization of Synthetic Aperture Radar (SAR) patches consists of feature extraction and classification. Recently, very good results were obtained using a convolutional neural networks for categorization of image patches. This paper presents deep convolutional networks for Synthetic Aperture Radar patch categorization. Several structures of deep convolutional networks are introduced. We have tested convolutional networks with 10 and 20 layers and analyzed recognition rate by changing SAR patch size. We have designed a custom database of SAR patches, which were cut from several spotlight TerraSAR-X products. Database consisted of 6 categories with approximately 1000 samples per each category. Experimental results showed that the Convolutional neural networks can achieve 84 % accuracy using patches with a size of 200 × 200 pixels and it performs slightly better than algorithm for categorization, which use dual tree oriented wavelet transformation, spectral features and Support vector machine, which achieved accuracy of 80 % using the same training and testing sets.

  • Research Article
  • Cite Count Icon 1
  • 10.1109/jstars.2022.3218360
A Trimodel SAR Semisupervised Recognition Method Based on Attention-Augmented Convolutional Networks
  • Jan 1, 2022
  • IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
  • Sifan Yan + 5 more

Semi-supervised learning (SSL) in synthetic aperture radars (SAR) is one of the research hotspots in the field of radar-image automatic target recognition (ATR). It can efficiently deal with challenging environments where there are insufficient labeled samples and large unlabeled samples in the SAR dataset. In recent years, consistency regularization methods in semi-supervised learning have shown considerable improvement in recognition accuracy and efficiency. Current consistency regularization approaches suffer from two main shortcomings: firstly, extracting all of the relevant information in the image target is difficult owing to the inability of conventional convolutional neural networks (CNN) to capture global relational information,; secondly, the standard teacher-student regularization methodology causes confirmation biases due to the high coupling between teacher and student models. This work adopts an innovative tri-model semi-supervised method based on attention-augmented convolutional networks to address the aforementioned obstacles. Specifically, we develop an attention mechanism incorporating a novel positional embedding method based on recurrent neural networks (RNN), and integrate this with a standard convolutional network as a feature extractor, to improve the network's ability to extract global feature information from images. Further, we address the confirmation bias problem by introducing a classmate model to the standard teacher-student structure and utilise the model to impose a weak consistency constraint (WCC) designed on the student to weaken the strong-coupling between the teacher and the student. Comparative experiments on the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset show that our method outperforms state-of-the-art semi-supervised methods in terms of recognition accuracy, demonstrating its potential as a new benchmark approach for the deep learning and SAR research community.

  • Research Article
  • Cite Count Icon 36
  • 10.1016/j.epsr.2022.108003
A Novel Capsule Convolutional Neural Network with Attention Mechanism for High-Voltage Circuit Breaker Fault Diagnosis
  • Apr 21, 2022
  • Electric Power Systems Research
  • Xinyu Ye + 4 more

A Novel Capsule Convolutional Neural Network with Attention Mechanism for High-Voltage Circuit Breaker Fault Diagnosis

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.