Swin-MFA: A Multi-Modal Fusion Attention Network Based on Swin-Transformer for Low-Light Image Human Segmentation.

Xunpeng Yi,Shujiang Guo,Jingyi Wu,Cien Fan,Haonan Zhang,Yibo Wang

doi:10.3390/s22166229

Abstract

In recent years, image segmentation based on deep learning has been widely used in medical imaging, automatic driving, monitoring and security. In the fields of monitoring and security, the specific location of a person is detected by image segmentation, and it is segmented from the background to analyze the specific actions of the person. However, in low-illumination conditions, it is a great challenge to the traditional image-segmentation algorithms. Unfortunately, a scene with low light or even no light at night is often encountered in monitoring and security. Given this background, this paper proposes a multi-modal fusion network based on the encoder and decoder structure. The encoder, which contains a two-branch swin-transformer backbone instead of the traditional convolutional neural network, fuses the RGB and depth features with a multiscale fusion attention block. The decoder is also made up of the swin-transformer backbone and is finally connected via the encoder with several residual connections, which are proven to be beneficial in improving the accuracy of the network. Furthermore, this paper first proposes the low light–human segmentation (LLHS) dataset of portrait segmentation, with aligned depth and RGB images with fine annotation under low illuminance, by combining the traditional monocular camera and a depth camera with active structured light. The network is also tested in different levels of illumination. Experimental results show that the proposed network has good robustness in the scene of human segmentation in a low-light environment with varying illumination. The mean Intersection over Union (mIoU), which is often used to evaluate the performance of image segmentation model, of the Swin-MFA in the LLHS dataset is 81.0, is better than those of ACNet, 3DGNN, ESANet, RedNet and RFNet at the same level of depth in a mixed multi-modal network and is far ahead of the segmentation algorithm that only uses RGB features, so it has important practical significance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Swin-MFA: A Multi-Modal Fusion Attention Network Based on Swin-Transformer for Low-Light Image Human Segmentation.

Abstract

Talk to us

Similar Papers

More From: Sensors

Lead the way for us

Journal: Sensors	Publication Date: Aug 19, 2022
License type: CC BY 4.0

Similar Papers

Comparison of Amplitude of Accommodation in Different Room Illumination while Using VDU as a Target
Chiranjib Majumder ... Nur Zafirah Zaimi
International Journal of Ophthalmic Research | VOL. 3
Chiranjib Majumder, et. al.Chiranjib Majumder ... Nur Zafirah Zaimi
01 Jan 2017
International Journal of Ophthalmic Research | VOL. 3

Asymmetric Adaptive Fusion in a Two-Stream Network for RGB-D Human Detection.
Wenli Zhang ... Xiang Guo
Sensors | VOL. 21
Wenli Zhang, et. al.Wenli Zhang ... Xiang Guo
29 Jan 2021
Sensors | VOL. 21

RDFNet: RGB-D Multi-level Residual Feature Fusion for Indoor Semantic Segmentation
Seungyong Lee ... Ki-Sang Hong
-
Seungyong Lee, et. al.Seungyong Lee ... Ki-Sang Hong
01 Oct 2017
01 Oct 2017

Effect of Illumination over Positive Fusional Vergence when Using VDU as Target
Chiranjib Majumder ... Lavanya Sinathamby
Journal of Clinical & Experimental Ophthalmology | VOL. 08
Chiranjib Majumder, et. al.Chiranjib Majumder ... Lavanya Sinathamby
01 Jan 2017
Journal of Clinical & Experimental Ophthalmology | VOL. 08

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Swin-MFA: A Multi-Modal Fusion Attention Network Based on Swin-Transformer for Low-Light Image Human Segmentation.

Abstract

Talk to us

Similar Papers

More From: Sensors