Fanet: Feature Amplification Network for Semantic Segmentation in Cluttered Background

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Existing deep learning approaches leave out the semantic cues that are crucial in semantic segmentation present in complex scenarios including cluttered backgrounds and translucent objects, etc. To handle these challenges, we propose a feature amplification network (FANet) as a backbone network that incorporates semantic information using a novel feature enhancement module at multi-stages. To achieve this, we propose an adaptive feature enhancement (AFE) block that benefits from both a spatial context module (SCM) and a feature refinement module (FRM) in a parallel fashion. SCM aims to exploit larger kernel leverages for the increased receptive field to handle scale variations in the scene. Whereas our novel FRM is responsible for generating semantic cues that can capture both low-frequency and high-frequency regions for better segmentation tasks. We perform experiments over challenging real-world ZeroWaste-f [1] dataset which contains background-cluttered and translucent objects. Our experimental results demonstrate the state-of-the-art performance compared to existing methods. The source code can be found at https://github.com/techmn/fanet.

Similar Papers
  • Research Article
  • Cite Count Icon 57
  • 10.1109/tmm.2020.2991592
SAL:Selection and Attention Losses for Weakly Supervised Semantic Segmentation
  • May 13, 2020
  • IEEE Transactions on Multimedia
  • Lei Zhou + 3 more

Training a fully supervised semantic segmentation network requires a large amount of expensive pixel-level annotations in manual labor. In this work, we focus on studying the semantic segmentation problem using only image-level supervision. An effective scheme for weakly supervised segmentation is employed to produce the proxy annotations via image tags firstly. Then the segmentation network is retrained on the generated noisy proxy annotations. However, learning from noisy annotations is risky, as proxy annotations of poor quality may deteriorate the performance of the baseline segmentation and classification networks. In order to train the segmentation network using noisy annotations more effectively, two novel loss functions are proposed in this paper, namely, the selection loss and attention loss. Firstly, a selection loss is designed by weighting the proxy annotations based on a coarse-to-fine strategy for evaluating the quality of segmentation masks. Secondly, an attention loss taking the clean image tags as supervision is utilized to correct the classification errors caused by ambiguous pixel-level labels. Finally, we propose an end-to-end semantic segmentation network SAL-Net guided by the above two losses. From the extensive experiments conducted on PASCAL VOC 2012 dataset, SAL-Net reaches state-of-the-art performance with mean IoU (mIoU) as 62.5% and 66.6% on the test set by taking VGG16 network and ResNet101 network as the baselines respectively, which demonstrates the superiority of the proposed algorithm over eight representative weakly supervised segmentation methods. The code and models are available at https://github.com/zmbhou/SALTMM.

  • Conference Article
  • Cite Count Icon 141
  • 10.1109/icdar.2017.50
Multi-Scale Multi-Task FCN for Semantic Page Segmentation and Table Detection
  • Nov 1, 2017
  • Dafang He + 4 more

Page segmentation and table detection play an important role in understanding the structure of documents. We present a page segmentation algorithm that incorporates state-of-the-art deep learning methods for segmenting three types of document elements: text blocks, tables, and figures. We propose a multi-scale, multi-task fully convolutional neural network (FCN) for the tasks of semantic page segmentation and element contour detection. The semantic segmentation network accurately predicts the probability at each pixel of the three element classes. The contour detection network accurately predicts instance level "edges" around each element occurrence. We propose a conditional random field (CRF) that uses features output from the semantic segmentation and contour networks to improve upon the semantic segmentation network output. Given the semantic segmentation output, we also extract individual table instances from the page using some heuristic rules and a verification network to remove false positives. We show that although we only consider a page image as input, we produce comparable results with other methods that relies on PDF file information and heuristics and hand crafted features tailored to specific types of documents. Our approach learns the representative features for page segmentation from real and synthetic training data. %, and produces good results on real documents. The learning-based property makes it a more general method than existing methods in terms of document types and element appearances. For example, our method reliably detects sparsely lined tables which are hard for rule-based or heuristic methods.

  • Research Article
  • Cite Count Icon 43
  • 10.1016/j.measurement.2023.113084
Improving RGB-D SLAM accuracy in dynamic environments based on semantic and geometric constraints
  • May 26, 2023
  • Measurement
  • Xiqi Wang + 3 more

Improving RGB-D SLAM accuracy in dynamic environments based on semantic and geometric constraints

  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.engappai.2024.109294
Weakly supervised semantic segmentation by knowledge graph inference
  • Sep 11, 2024
  • Engineering Applications of Artificial Intelligence
  • Jia Zhang + 3 more

Weakly supervised semantic segmentation by knowledge graph inference

  • Research Article
  • Cite Count Icon 8
  • 10.1007/s00138-022-01276-z
Cross-validation of a semantic segmentation network for natural history collection specimens
  • Mar 21, 2022
  • Machine Vision and Applications
  • Abraham Nieva De La Hidalga + 10 more

Semantic segmentation has been proposed as a tool to accelerate the processing of natural history collection images. However, developing a flexible and resilient segmentation network requires an approach for adaptation which allows processing different datasets with minimal training and validation. This paper presents a cross-validation approach designed to determine whether a semantic segmentation network possesses the flexibility required for application across different collections and institutions. Consequently, the specific objectives of cross-validating the semantic segmentation network are to (a) evaluate the effectiveness of the network for segmenting image sets derived from collections different from the one in which the network was initially trained on; and (b) test the adaptability of the segmentation network for use in other types of collections. The resilience to data variations from different institutions and the portability of the network across different types of collections are required to confirm its general applicability. The proposed validation method is tested on the Natural History Museum semantic segmentation network, designed to process entomological microscope slides. The proposed semantic segmentation network is evaluated through a series of cross-validation experiments designed to test using data from two types of collections: microscope slides (from three institutions) and herbarium sheets (from seven institutions). The main contribution of this work is the method, software and ground truth sets created for this cross-validation as they can be reused in testing similar segmentation proposals in the context of digitization of natural history collections. The cross-validation of segmentation methods should be a required step in the integration of such methods into image processing workflows for natural history collections.

  • Research Article
  • Cite Count Icon 4
  • 10.2139/ssrn.4231956
A Unified Architecture of Semantic Segmentation and Hierarchical Generative Adversarial Networks for Expression Manipulation
  • Jan 1, 2022
  • SSRN Electronic Journal
  • Rumeysa Bodur + 2 more

Editing facial expressions by only changing what we want is a long-standing research problem in Generative Adversarial Networks (GANs) for image manipulation. Most of the existing methods that rely only on a global generator usually suffer from changing unwanted attributes along with the target attributes. Recently, hierarchical networks that consist of both a global network dealing with the whole image and multiple local networks focusing on local parts are showing success. However, these methods extract local regions by bounding boxes centred around the sparse facial key points which are non-differentiable, inaccurate and unrealistic. Hence, the solution becomes sub-optimal, introduces unwanted artefacts degrading the overall quality of the synthetic images. Moreover, a recent study has shown strong correlation between facial attributes and local semantic regions. To exploit this relationship, we designed a unified architecture of semantic segmentation and hierarchical GANs. A unique advantage of our framework is that on forward pass the semantic segmentation network conditions the generative model, and on backward pass gradients from hierarchical GANs are propagated to the semantic segmentation network, which makes our framework an end-to-end differentiable architecture. This allows both architectures to benefit from each other. To demonstrate its advantages, we evaluate our method on two challenging facial expression translation benchmarks, AffectNet and RaFD, and a semantic segmentation benchmark, CelebAMask-HQ across two popular architectures, BiSeNet and UNet. Our extensive quantitative and qualitative evaluations on both face semantic segmentation and face expression manipulation tasks validate the effectiveness of our work over existing state-of-the-art methods.

  • Research Article
  • Cite Count Icon 30
  • 10.1016/j.neucom.2018.01.022
Contour-aware network for semantic segmentation via adaptive depth
  • Feb 14, 2018
  • Neurocomputing
  • Zhiyu Jiang + 2 more

Contour-aware network for semantic segmentation via adaptive depth

  • Research Article
  • Cite Count Icon 2
  • 10.1016/j.neucom.2021.04.119
Training and inference for integer-based semantic segmentation network
  • May 4, 2021
  • Neurocomputing
  • Jiayi Yang + 4 more

Training and inference for integer-based semantic segmentation network

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 116
  • 10.7717/peerj-cs.607
Chest X-ray pneumothorax segmentation using U-Net with EfficientNet and ResNet architectures.
  • Jun 29, 2021
  • PeerJ. Computer science
  • Ayat Abedalla + 3 more

Medical imaging refers to visualization techniques to provide valuable information about the internal structures of the human body for clinical applications, diagnosis, treatment, and scientific research. Segmentation is one of the primary methods for analyzing and processing medical images, which helps doctors diagnose accurately by providing detailed information on the body’s required part. However, segmenting medical images faces several challenges, such as requiring trained medical experts and being time-consuming and error-prone. Thus, it appears necessary for an automatic medical image segmentation system. Deep learning algorithms have recently shown outstanding performance for segmentation tasks, especially semantic segmentation networks that provide pixel-level image understanding. By introducing the first fully convolutional network (FCN) for semantic image segmentation, several segmentation networks have been proposed on its basis. One of the state-of-the-art convolutional networks in the medical image field is U-Net. This paper presents a novel end-to-end semantic segmentation model, named Ens4B-UNet, for medical images that ensembles four U-Net architectures with pre-trained backbone networks. Ens4B-UNet utilizes U-Net’s success with several significant improvements by adapting powerful and robust convolutional neural networks (CNNs) as backbones for U-Nets encoders and using the nearest-neighbor up-sampling in the decoders. Ens4B-UNet is designed based on the weighted average ensemble of four encoder-decoder segmentation models. The backbone networks of all ensembled models are pre-trained on the ImageNet dataset to exploit the benefit of transfer learning. For improving our models, we apply several techniques for training and predicting, including stochastic weight averaging (SWA), data augmentation, test-time augmentation (TTA), and different types of optimal thresholds. We evaluate and test our models on the 2019 Pneumothorax Challenge dataset, which contains 12,047 training images with 12,954 masks and 3,205 test images. Our proposed segmentation network achieves a 0.8608 mean Dice similarity coefficient (DSC) on the test set, which is among the top one-percent systems in the Kaggle competition.

  • Book Chapter
  • Cite Count Icon 39
  • 10.1007/978-3-031-37703-7_19
NNV 2.0: The Neural Network Verification Tool
  • Jan 1, 2023
  • Diego Manzanas Lopez + 3 more

This manuscript presents the updated version of the Neural Network Verification (NNV) tool. NNV is a formal verification software tool for deep learning models and cyber-physical systems with neural network components. NNV was first introduced as a verification framework for feedforward and convolutional neural networks, as well as for neural network control systems. Since then, numerous works have made significant improvements in the verification of new deep learning models, as well as tackling some of the scalability issues that may arise when verifying complex models. In this new version of NNV, we introduce verification support for multiple deep learning models, including neural ordinary differential equations, semantic segmentation networks and recurrent neural networks, as well as a collection of reachability methods that aim to reduce the computation cost of reachability analysis of complex neural networks. We have also added direct support for standard input verification formats in the community such as VNNLIB (verification properties), and ONNX (neural networks) formats. We present a collection of experiments in which NNV verifies safety and robustness properties of feedforward, convolutional, semantic segmentation and recurrent neural networks, as well as neural ordinary differential equations and neural network control systems. Furthermore, we demonstrate the capabilities of NNV against a commercially available product in a collection of benchmarks from control systems, semantic segmentation, image classification, and time-series data.

  • Research Article
  • Cite Count Icon 13
  • 10.3390/app12157811
Real-Time Semantic Understanding and Segmentation of Urban Scenes for Vehicle Visual Sensors by Optimized DCNN Algorithm
  • Aug 3, 2022
  • Applied Sciences
  • Yanyi Li + 2 more

The modern urban environment is becoming more and more complex. In helping us identify surrounding objects, vehicle vision sensors rely more on the semantic segmentation ability of deep learning networks. The performance of a semantic segmentation network is essential. This factor will directly affect the comprehensive level of driving assistance technology in road environment perception. However, the existing semantic segmentation network has a redundant structure, many parameters, and low operational efficiency. Therefore, to reduce the complexity of the network and reduce the number of parameters to improve the network efficiency, based on the deep learning (DL) theory, a method for efficient image semantic segmentation using Deep Convolutional Neural Network (DCNN) is deeply studied. First, the theoretical basis of the convolutional neural network (CNN) is briefly introduced, and the real-time semantic segmentation technology of urban scenes based on DCNN is recommended in detail. Second, the atrous convolution algorithm and the multi-scale parallel atrous spatial pyramid model are introduced. On the basis of this, an Efficient Symmetric Network (ESNet) of real-time semantic segmentation model for autonomous driving scenarios is proposed. The experimental results show that: (1) On the Cityscapes dataset, the ESNet structure achieves 70.7% segmentation accuracy for the 19 semantic categories set, and 87.4% for the seven large grouping categories. Compared with other algorithms, the accuracy has increased to varying degrees. (2) On the CamVid dataset, compared with segmentation networks of multiple lightweight real-time images, the parameters of the ESNet model are around 1.2 m, the highest FPS value is around 90 Hz, and the highest mIOU value is around 70%. In seven semantic categories, the segmentation accuracy of the ESNet model is the highest at around 98%. From this, we found that the ESNet significantly improves segmentation accuracy while maintaining faster forward inference speed. Overall, the research not only provides technical support for the development of real-time semantic understanding and segmentation of DCNN algorithms but also contributes to the development of artificial intelligence technology.

  • Conference Article
  • Cite Count Icon 1
  • 10.1117/12.2623412
Spatially bound categorical attention network for semantic segmentation
  • Feb 16, 2022
  • Wei Li + 4 more

Semantic segmentation is a key step of image prehension. Single use of convolutional networks for semantic segmentation makes it difficult to distinguish the same class of objects with large contour deviations, while the higherlevel features will lose some the detailed information. Currently, Networks such as ACFNet and DANet have introduced attention mechanism to improve scene classification by obtaining rich contextual information through self-controlled system, but they do not combine both global scope and class feature relationships in local space to further advance intraclass consistency and inter-class divisibility of features. In terms of this problem, semantic segmentation network of categorical attention with spatial constraints has been proposed, which contains two submodules, one using the category spatial distribution to introduce local spatial location information of features, and the other using the global category average strength to introduce global strength information of category features. By selecting a kind of appropriate backbone network, this network model obtains the feature map from the backbone network and stacks the features into the original features after two submodules of global category strength and category local space processing, finally, performing classification processing by the classification layer and up-samples to the input image size to complete the pixel-level label prediction. The experiment result demonstrates that this proposed segmentation network has higher accuracy than existing segmentation networks.

  • Research Article
  • Cite Count Icon 33
  • 10.1016/j.compag.2023.108507
A deep learning approach combining DeepLabV3+ and improved YOLOv5 to detect dairy cow mastitis
  • Dec 9, 2023
  • Computers and Electronics in Agriculture
  • Yanchao Wang + 3 more

A deep learning approach combining DeepLabV3+ and improved YOLOv5 to detect dairy cow mastitis

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/aicit55386.2022.9930225
Self-attentive Semantic Segmentation Model Based On Generative Adversarial Network
  • Sep 16, 2022
  • Hongchang Yang + 1 more

To address the problems that existing recognition networks rely on a large amount of labeled data and the perceptual field is limited to the local area of convolution, and the lack of understanding of contextual information, this paper proposes a self-attentive semantic segmentation method based on generative adversarial networks. The method is based on generating adversarial networks, constructing semantic segmentation networks and discriminators, where the semantic segmentation network uses Resnet101 as the backbone to connect the spatial pyramid pooling module of PSPNet and adopts the cross-attention method in order to overcome the problem of too many parameters of classical attention models. The model was simulated in the publicly available PASCAL VOC 2012 dataset, and the results showed that the MIoU of the model reached 73.1%, 74.4%, and 75.1% for 1/8, 1/4, and 1/2 with labels, respectively, in the first semi-supervised experiments without improving the segmentation network compared to the control group, which were 3.6%, 2.3%, and 1.3% higher, respectively. The MIoU value reached 75.4% after improving the segmentation network, which proved the superiority and effectiveness of this model.

  • Research Article
  • Cite Count Icon 63
  • 10.1007/s10489-021-02446-8
Joint pyramid attention network for real-time semantic segmentation of urban scenes
  • May 6, 2021
  • Applied Intelligence
  • Xuegang Hu + 2 more

Semantic segmentation is an advanced research topic in computer vision and can be regarded as a fundamental technique for image understanding and analysis. However, most of the current semantic segmentation networks only focus on segmentation accuracy while ignoring the requirements for high processing speed and low computational complexity in mobile terminal fields such as autonomous driving systems, drone applications, and fingerprint recognition systems. Aiming at the problems that the current semantic segmentation task are facing, it is difficult to meet the actual industrial needs due to its high computational cost. We propose a joint pyramid attention network (JPANet) for real-time semantic segmentation. First, we propose a joint feature pyramid (JFP) module, which can combine multiple network stages with learning multi-scale feature representations with strong semantic information, hence improving pixel classification performance. Second, we built a spatial detail extraction (SDE) module to capture the shallow network multi-level local features and make up for the geometric information lost in the down-sampling stage. Finally, we design a bilateral feature fusion (BFF) module, which properly integrates spatial information and semantic information through a hybrid attention mechanism in spatial dimensions and channel dimensions, making full use of the correspondence between high-level features and low-level features. We conducted a series of experiments on two challenging urban road scene datasets (Cityscapes and CamVid) and achieved excellent results. Among them, the experimental results on the Cityscapes dataset show that for 512 × 1024 high-resolution images, our method achieves 71.62% Mean Intersection over Union (mIoU) with 109.9 frames per second (FPS) on a single 1080Ti GPU.

Save Icon
Up Arrow
Open/Close