MSAFNet: Multi-Modal Marine Aquaculture Segmentation via Spatial–Frequency Adaptive Fusion
Accurate mapping of marine aquaculture areas is critical for environmental management and sustainable development for marine ecosystem protection and sustainable resource utilization. However, remote sensing imagery based on single-sensor modalities has inherent limitations when extracting aquaculture zones in complex marine environments. To address this challenge, we constructed a multi-modal dataset from five Chinese coastal regions using cloud detection methods and developed Multi-modal Spatial–Frequency Adaptive Fusion Network (MSAFNet) for optical-radar data fusion. MSAFNet employs a dual-path architecture utilizing a Multi-scale Dual-path Feature Module (MDFM) that combines CNN and Transformer capabilities to extract multi-scale features. Additionally, it implements a Dynamic Frequency Domain Adaptive Fusion Module (DFAFM) to achieve deep integration of multi-modal features in both spatial and frequency domains, effectively leveraging the complementary advantages of different sensor data. Results demonstrate that MSAFNet achieves 76.93% mean intersection over union (mIoU), 86.96% mean F1 score (mF1), and 93.26% mean Kappa coefficient (mKappa) in extracting floating raft aquaculture (FRA) and cage aquaculture (CA), significantly outperforming existing methods. Applied to China’s coastal waters, the model generated 2020 nearshore aquaculture distribution maps, demonstrating its generalization capability and practical value in complex marine environments. This approach provides reliable technical support for marine resource management and ecological monitoring.
8
- 10.1109/mgrs.2024.3495516
- Mar 1, 2025
- IEEE Geoscience and Remote Sensing Magazine
1
- 10.3390/rs16152825
- Aug 1, 2024
- Remote Sensing
5
- 10.3390/rs15194817
- Oct 3, 2023
- Remote Sensing
7
- 10.3390/rs15092243
- Apr 23, 2023
- Remote Sensing
175895
- 10.1109/cvpr.2016.90
- Jun 1, 2016
56
- 10.1080/01431161.2019.1706009
- Jan 9, 2020
- International Journal of Remote Sensing
2715
- 10.1016/j.tree.2005.05.011
- Jun 9, 2005
- Trends in Ecology & Evolution
2
- 10.1109/jstars.2024.3471925
- Jan 1, 2024
- IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
6
- 10.1007/s11442-023-2181-z
- Dec 1, 2023
- Journal of Geographical Sciences
536
- 10.3390/electronics10202470
- Oct 11, 2021
- Electronics
- Research Article
1
- 10.3390/rs16152825
- Aug 1, 2024
- Remote Sensing
The accurate extraction and monitoring of offshore aquaculture areas are crucial for the marine economy, environmental management, and sustainable development. Existing methods relying on unimodal remote sensing images are limited by natural conditions and sensor characteristics. To address this issue, we integrated multispectral imaging (MSI) and synthetic aperture radar imaging (SAR) to overcome the limitations of single-modal images. We propose a cross-modal multidimensional frequency perception network (CMFPNet) to enhance classification and extraction accuracy. CMFPNet includes a local–global perception block (LGPB) for combining local and global semantic information and a multidimensional adaptive frequency filtering attention block (MAFFAB) that dynamically filters frequency-domain information that is beneficial for aquaculture area recognition. We constructed six typical offshore aquaculture datasets and compared CMFPNet with other models. The quantitative results showed that CMFPNet outperformed the existing methods in terms of classifying and extracting floating raft aquaculture (FRA) and cage aquaculture (CA), achieving mean intersection over union (mIoU), mean F1 score (mF1), and mean Kappa coefficient (mKappa) values of 87.66%, 93.41%, and 92.59%, respectively. Moreover, CMFPNet has low model complexity and successfully achieves a good balance between performance and the number of required parameters. Qualitative results indicate significant reductions in missed detections, false detections, and adhesion phenomena. Overall, CMFPNet demonstrates great potential for accurately extracting large-scale offshore aquaculture areas, providing effective data support for marine planning and environmental protection. Our code is available at Data Availability Statement section.
- Research Article
2
- 10.3390/rs16203823
- Oct 14, 2024
- Remote Sensing
Marine mammal acoustic signal recognition is a key technology for species conservation and ecological environment monitoring. Aiming at the complex and changing marine environment, and because the traditional recognition method based on a single feature input has the problems of poor environmental adaptability and low recognition accuracy, this paper proposes a dual-feature fusion learning method. First, dual-domain feature extraction is performed on marine mammal acoustic signals to overcome the limitations of single feature input methods by interacting feature information between the time-frequency domain and the Delay-Doppler domain. Second, this paper constructs a dual-feature fusion learning target recognition model, which improves the generalization ability and robustness of mammal acoustic signal recognition in complex marine environments. Finally, the feasibility and effectiveness of the dual-feature fusion learning target recognition model are verified in this study by using the acoustic datasets of three marine mammals, namely, the Fraser’s Dolphin, the Spinner Dolphin, and the Long-Finned Pilot Whale. The dual-feature fusion learning target recognition model improved the accuracy of the training set by 3% to 6% and 20% to 23%, and the accuracy of the test set by 1% to 3% and 25% to 38%, respectively, compared to the model that used the time-frequency domain features and the Delay-Doppler domain features alone for recognition.
- Research Article
13
- 10.1360/03yd9007
- Jan 1, 2003
- Science in China Series D
The Chinese coastal regions are the high risk areas of natural disasters for their low land and weak and sensitive eco-evironment. The relative sea-level rising (RSLR), resulting from the piling of global sea-level rising and regional land subsidence, is to speed up in the 21st century. Certainly the RSLR will exacerbate the land submerging, the disaster from storm-tide and flood- waterlogging and the water shortage, and then affect urban withstanding function, construction safety and eco-resources. According to sustainable development theory, the sustainable utilization of resources and environment and sustainable development of economy and society can both be effectively achieved in coastal regions of China in the 21st century only by the implementation of controlling discharge of greenhouse gas, optimal exploitation of and artificial recharge of groundwater, systematic control of land subsidence, higher design standards for tide and flood control engineering, improving urban anti-disaster ability, a study on strategy and policy for RSLR and establishment of forecast and pre-warning institution.
- Research Article
- 10.1142/s0129156425406369
- Jun 16, 2025
- International Journal of High Speed Electronics and Systems
The integration of artificial intelligence (AI) is revolutionizing intelligent ocean exploration, offering transformative solutions for next-generation autonomous underwater vehicles (AUVs). This paper systematically reviews current research on AUVs, identifying three critical challenges: limited underwater endurance due to energy constraints (limited battery capacity and inefficient recharge methods), insufficient adaptability of autonomous control systems in complex marine environments (dynamic obstacle avoidance, unpredictable current disturbances and poor algorithm generalization), and degraded visual perception caused by underwater turbidity and lighting variability (underwater turbidity, light reflection, illumination, mirrored targets, suspended particles, etc.). We analyze these technical bottlenecks through an AI-driven lens and propose targeted improvement strategies. Furthermore, emerging development trends are projected, emphasizing the synergistic advancement of four key areas: Wireless power transfer for AUVs, autonomous underwater vehicle–manipulator systems with artificial intelligence; advanced sensing, navigation and autonomy for AUVs; and collaborative operations (underwater robot swarm operations). The analysis highlights how these innovations will enable adaptive, energy-efficient and environmentally resilient AUV systems, ultimately accelerating sustainable ocean exploration and resource utilization.
- Research Article
- 10.1016/j.jclinepi.2025.111944
- Aug 25, 2025
- Journal of clinical epidemiology
Use of artificial intelligence to support the assessment of the methodological quality of systematic reviews.
- Research Article
1
- 10.1080/01431161.2024.2406035
- Oct 3, 2024
- International Journal of Remote Sensing
In optical remote sensing images, clouds exhibit irregular scales and boundaries that vary with elevation across diverse geographical locations. To accurately capture the diverse visual patterns of clouds, we propose a cloud image segmentation approach named GS-CDNet (Geographic Spatial Data-Cloud Detection Network), which is based on the integration of geospatial data with multifaceted self-attention feature extraction, multi-scale feature aggregation, and boundary clarification techniques.Firstly, we utilize geographical coordinates from optical remote sensing images to extract a raster DEM (Digital Elevation Model) from SRTM3. This process creates a dataset consisting of elevation images, longitude, and latitude maps as geospatial data, enhancing the model’s capability in spatial positioning for cloud detection. Secondly, the proposed method consists of three interconnected modules within the cloud detection network: the Interleaved Self-Attention module(ISAM) utilizes a variety of self-attention mechanisms in an interleaved manner to extract multi-scale feature information.The Bidirectional Multi-Scale Feature Fusion Module(BIMFM) is responsible for integrating features, enabling a more comprehensive contextual understanding. The Boundary Extraction Module(BEM) utilizes a residual structure to generate a boundary cloud mask, effectively addressing the common issue of boundary blurring in multi-scale cloud masks. Finally, we compared and evaluated GS-CDNet with other cloud detection methods and conducted an ablation study on the key components of the method. The validation of generalization performance demonstrates the exceptional performance of the proposed model in cloud mask generation. Geospatial data and the different modules of the method play a significant role in the model.
- Research Article
- 10.3389/fmars.2025.1522160
- Mar 11, 2025
- Frontiers in Marine Science
Underwater images segmentation is essential for tasks such as underwater exploration, marine environmental monitoring, and resource development. Nevertheless, given the complexity and variability of the underwater environment, improving model accuracy remains a key challenge in underwater image segmentation tasks. To address these issues, this study presents a high-performance semantic segmentation approach for underwater images based on the standard SegFormer model. First, the Mix Transformer backbone in SegFormer is replaced with a Swin Transformer to enhance feature extraction and facilitate efficient acquisition of global context information. Next, the Efficient Multi-scale Attention (EMA) mechanism is introduced in the backbone’s downsampling stages and the decoder to better capture multi-scale features, further improving segmentation accuracy. Furthermore, a Feature Pyramid Network (FPN) structure is incorporated into the decoder to combine feature maps at multiple resolutions, allowing the model to integrate contextual information effectively, enhancing robustness in complex underwater environments. Testing on the SUIM underwater image dataset shows that the proposed model achieves high performance across multiple metrics: mean Intersection over Union (MIoU) of 77.00%, mean Recall (mRecall) of 85.04%, mean Precision (mPrecision) of 89.03%, and mean F1score (mF1score) of 86.63%. Compared to the standard SegFormer, it demonstrates improvements of 3.73% in MIoU, 1.98% in mRecall, 3.38% in mPrecision, and 2.44% in mF1score, with an increase of 9.89M parameters. The results demonstrate that the proposed method achieves superior segmentation accuracy with minimal additional computation, showcasing high performance in underwater image segmentation.
- Research Article
8
- 10.1007/s10489-020-02114-3
- Jan 12, 2021
- Applied Intelligence
Multi-scale features are usually utilized to improve the performance of interactive image segmentation, however, they have varying leverages over the result of segmentation, for example, thinner segmentation results could be achieved by pixel-level features, but sensitive to image noise, and superpixel-level features could provide the semantic perception of the object, but easily lead to over-segmentations. Therefore, we propose an interactive image segmentation algorithm by adaptive fusion with multi-scale features (AFMSF). It intends on combining the multi-scale information adaptively for the segmentation via learning the influence coefficients of multi-scale features. First, multi-scale superpixel layers are generated by controlling the size of superpixels. Based on features of this multi-scale information, the similarity matrices and label priors with pixel-superpixel levels are then obtained. A fusion with diffusion strategy is designed to build the energy function by combining these multi-cues. Finally, the influence coefficient of each scale and the labeling are updated with each other until convergence. The algorithm we proposed is robust to diverse circumstances of objects, the experimental results on public interactive image segmentation datasets Graz, LHI, and MSRC validate the superior performance of the proposed method.
- Research Article
203
- 10.1109/tgrs.2018.2889677
- Jun 1, 2019
- IEEE Transactions on Geoscience and Remote Sensing
Cloud detection in remote sensing images is a challenging but significant task. Due to the variety and complexity of underlying surfaces, most of the current cloud detection methods have difficulty in detecting thin cloud regions. In fact, it is quite meaningful to distinguish thin clouds from thick clouds, especially in cloud removal and target detection tasks. Therefore, we propose a method based on multiscale features-convolutional neural network (MF-CNN) to detect thin cloud, thick cloud, and noncloud pixels of remote sensing images simultaneously. Landsat 8 satellite imagery with various levels of cloud coverage is used to demonstrate the effectiveness of our proposed MF-CNN model. We first stack visible, near-infrared, short-wave, cirrus, and thermal infrared bands of Landsat 8 imagery to obtain the combined spectral information. The MF-CNN model is then used to learn the multiscale global features of input images. The high-level semantic information obtained in the process of feature learning is integrated with low-level spatial information to classify the imagery into thick, thin and noncloud regions. The performance of our proposed model is compared to that of various commonly used cloud detection methods in both qualitative and quantitative aspects. Compared to other cloud detection methods, the experimental results show that our proposed method has a better performance not only in thick and thin clouds but also in the entire cloud regions.
- Research Article
60
- 10.1016/j.hal.2021.102057
- Jun 16, 2021
- Harmful Algae
Research on the biology and ecology of the harmful algal bloom species Phaeocystis globosa in China: Progresses in the last 20 years
- Research Article
23
- 10.1016/j.marpolbul.2020.111253
- May 17, 2020
- Marine Pollution Bulletin
Spatial distribution patterns of planktonic ciliate communities in the East China Sea: Potential indicators of water masses
- Research Article
5
- 10.1109/tgrs.2023.3266273
- Jan 1, 2023
- IEEE Transactions on Geoscience and Remote Sensing
Textured 3D mesh is one of the final user products in photogrammetry and remote sensing. However, research on the semantic segmentation of complex urban scenes represented by textured 3D meshes is in its infancy. We present a mesh-based dynamic graph CNN (DGCNN) for the semantic segmentation of textured 3D meshes. To represent each mesh facet, composite input feature vectors are constructed by concatenating the face-inherent features, i.e., XYZ coordinates of the center of gravity (CoG), texture values, and normal vectors. A texture fusion module is embedded into the proposed mesh-based DGCNN to generate high-level semantic features of the high-resolution texture information, which is useful for semantic segmentation. We achieve competitive accuracies when the proposed method is applied to the SUM mesh datasets. The overall accuracy (OA), Kappa coefficient (Kap), mean precision (mP), mean recall (mR), mean F1 score (mF1), and mean intersection over union (mIoU) are 93.3%, 88.7%, 79.6%, 83.0%, 80.7%, and 69.6%, respectively. In particular, the OA, mean class accuracy (mAcc), mIoU, and mF1 increase by 0.3%, 12.4%, 3.4%, and 6.9%, respectively, compared to the state-of-the-art method.
- Research Article
10
- 10.3390/rs15133417
- Jul 6, 2023
- Remote Sensing
Accurate assessment of the extent of crop distribution and mapping different crop types are essential for monitoring and managing modern agriculture. Medium and high spatial resolution remote sensing (RS) for Earth observation and deep learning (DL) constitute one of the most major and effective tools for crop mapping. In this study, we used high-resolution Sentinel-2 imagery from Google Earth Engine (GEE) to map paddy rice and winter wheat in the Bengbu city of Anhui Province, China. We compared the performance of different popular DL backbone networks with the traditional machine learning (ML) methods, including HRNet, MobileNet, Xception, and Swin Transformer, within the improved DeepLabv3+ architecture, Segformer and random forest (RF). The results showed that the Segformer based on the combination of the Transformer architecture encoder and the lightweight multilayer perceptron (MLP) decoder achieved an overall accuracy (OA) value of 91.06%, a mean F1 Score (mF1) value of 89.26% and a mean Intersection over Union (mIoU) value of 80.70%. The Segformer outperformed other DL methods by combining the results of multiple evaluation metrics. Except for Swin Transformer, which was slightly lower than RF in OA, all DL methods significantly outperformed RF methods in accuracy for the main mapping objects, with mIoU improving by about 13.5~26%. The predicted images of paddy rice and winter wheat from the Segformer were characterized by high mapping accuracy, clear field edges, distinct detail features and a low false classification rate. Consequently, DL is an efficient option for fast and accurate mapping of paddy rice and winter wheat based on RS imagery.
- Research Article
22
- 10.1080/01431161.2020.1871100
- Jan 28, 2021
- International Journal of Remote Sensing
Convolutional Neural Network (CNN) is widely used for semantic segmentation and land-use and land-cover (LULC) mapping of very high-resolution (VHR) remote sensing images. The convolution operation is a powerful method for VHR classification, but the loss of high-frequency detail information caused during its operation decreases the classification accuracy, particularly in the boundary. Thus, it is necessary to supply additional boundary information to the CNN for alleviating this situation. In the classification task (and in LULC mapping), providing more effective information generates a better classification result. Current methods regard the boundary of images as the same category object and process it uniformly, which loses a notable amount of useful information because of the different properties, such as ambiguity and transition, between remote sensing images and their boundaries. Thus, a semantic segmentation method with category boundary for LULC mapping is proposed in this paper. First, a multi-task CNN called the category boundary detection network (CBDN) is designed to extract the boundary information of different category objects. Second, this category boundary and VHR images are used for initial semantic segmentation. Finally, the category boundary and the initial semantic segmentation result (ISSR) are fused to obtain the final LULC map by a two-step strategy, including the explicit fusion and the boundary attention loss function. To verify whether category boundary improved the classification accuracy, a set of comparative experiments were conducted on the International Society for Photogrammetry and Remote Sensing (ISPRS) Vaihingen and Potsdam datasets. The method in this paper was compared with a semantic segmentation method with no boundary information and a semantic segmentation method with global boundary. The results showed that the proposed method in this paper achieved good performances in the Vaihingen (overall accuracy (OA) = 0.924, Kappa coefficient (K) = 0.898, mean F1 score (mF1) = 0.896 and mean Intersection over Union (mIoU) = 0.817) and Potsdam datasets (OA = 0.890, K = 0.857, mF1 = 0.923, and mIoU = 0.860) based on the eroded labels.
- Research Article
- 10.1364/oe.533540
- Sep 12, 2024
- Optics express
The turbidity of water is crucial for the health of river and lake ecosystems, necessitating efficient monitoring for effective water management. Existing methods for studying water turbidity's spatial and temporal distribution rely mostly on measured data. There is limited research on the classification of water bodies with different turbidity levels. The main challenge lies in determining the boundaries of liquid water bodies at various turbidity levels, making it challenging to classify them accurately using traditional remote sensing image classification methods. This paper proposes and validates an intelligent turbidity classification method based on deep learning using GaoFen-1 multispectral remote sensing imagery. An adaptive threshold water extraction method based on the Normalized Difference Water Index is proposed to capture water boundaries more accurately to improve the accuracy of extracting nearshore water bodies. A semi-automatic semantic annotation method for water turbidity is introduced to reduce manual labeling costs. The paper applies mode filtering to address edge noise issues and establishes a high-quality training sample dataset. After comparing the accuracy of various neural network models, DeepLab V3+ is selected for intelligent turbidity classification. The results show high accuracy, with mean intersection over union (MIoU), mean F1 score (MF1), and overall accuracy (OA) reaching 94.73%, 97.29%, and 97.54%, respectively. The proposed method and experiments demonstrate the feasibility of intelligent classification of water bodies with different turbidity levels using deep learning networks. This provides a new approach for large-scale and efficient remote sensing water turbidity monitoring.
- New
- Research Article
- 10.3390/rs17213650
- Nov 5, 2025
- Remote Sensing
- New
- Research Article
- 10.3390/rs17213649
- Nov 5, 2025
- Remote Sensing
- New
- Research Article
- 10.3390/rs17213644
- Nov 5, 2025
- Remote Sensing
- New
- Research Article
- 10.3390/rs17213648
- Nov 5, 2025
- Remote Sensing
- New
- Research Article
- 10.3390/rs17213646
- Nov 5, 2025
- Remote Sensing
- New
- Research Article
- 10.3390/rs17213651
- Nov 5, 2025
- Remote Sensing
- New
- Research Article
- 10.3390/rs17213643
- Nov 5, 2025
- Remote Sensing
- New
- Research Article
- 10.3390/rs17213647
- Nov 5, 2025
- Remote Sensing
- New
- Research Article
- 10.3390/rs17213642
- Nov 5, 2025
- Remote Sensing
- New
- Research Article
- 10.3390/rs17213645
- Nov 5, 2025
- Remote Sensing
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.