Semantic Segmentation
Semantic segmentation was traditionally performed using primitive methods; however, in recent times, a significant growth in the advancement of deep learning techniques for the same is observed. In this paper, an extensive study and review of the existing deep learning (DL)-based techniques used for the purpose of semantic segmentation is carried out along with a summary of the datasets and evaluation metrics used for the same. The paper begins with a general and broader focus on semantic segmentation as a problem and further narrows its focus on existing DL-based approaches for this task. In addition to this, a summary of the traditional methods used for semantic segmentation is also presented towards the beginning. Since the problem of scene understanding is being vastly explored in the computer vision community, especially with the help of semantic segmentation, the authors believe that this paper will benefit active researchers in reviewing and studying the existing state-of-the-art as well as advanced methods for the same.
- Research Article
1
- 10.24840/2183-6493_008.006_0010
- Nov 28, 2022
- U.Porto Journal of Engineering
In recent times, the computer vision community has seen remarkable growth in the field of scene understanding. With such a wide prevalence of images, the importance of this field is growing rapidly along with the technologies involved in it. Semantic Segmentation is an important step in scene understanding which requires the assignment of each pixel in an image to a pre-defined class and achieving 100% accuracy is a challenging task, thereby making it an active research topic among researchers. In this paper, an extensive study and review of the existing Deep Learning (DL) based techniques used for Semantic Segmentation is carried out along with a summary of the datasets and evaluation metrics used for it. The study involved the meticulous selection of relevant research papers in the field of interest by search based on several defined keywords. The study begins with a general and broader focus on Semantic Segmentation as a problem and further narrows its focus on existing Deep Learning (DL) based approaches for this task. In addition to this, a summary of the traditional methods used for Semantic Segmentation is also presented. The contents of this study are organized to provide ease of access to the relevant literature available for the problem of Semantic Segmentation, with a concentrated focus on DL-based methods. Since the problem of scene understanding is being vastly explored by the computer vision community, especially with the help of Semantic Segmentation, we believe that this study will benefit active researchers in reviewing and studying the existing state-of-the-art, as well as advanced methods for the same.
- Book Chapter
32
- 10.1016/b978-0-32-385787-1.00017-8
- Jan 1, 2022
- Deep Learning for Robot Perception and Cognition
Chapter 12 - Semantic scene segmentation for robotics
- Research Article
289
- 10.1007/s11633-017-1053-3
- Jan 18, 2017
- International Journal of Automation and Computing
The deep learning technology has shown impressive performance in various vision tasks such as image classification, object detection and semantic segmentation. In particular, recent advances of deep learning techniques bring encouraging performance to fine-grained image classification which aims to distinguish subordinate-level categories, such as bird species or dog breeds. This task is extremely challenging due to high intra-class and low inter-class variance. In this paper, we review four types of deep learning based fine-grained image classification approaches, including the general convolutional neural networks (CNNs), part detection based, ensemble of networks based and visual attention based fine-grained image classification approaches. Besides, the deep learning based semantic segmentation approaches are also covered in this paper. The region proposal based and fully convolutional networks based approaches for semantic segmentation are introduced respectively.
- Research Article
- 10.46610/joidacs.2024.v01i01.004
- Apr 15, 2024
- Journal of Intelligent Data Analysis and Computational Statistics
Recent advancements in deep learning techniques tailored for autonomous driving in Indian road conditions are crucial for revolutionizing transportation systems. Indian roads present unique challenges, including unpredictable traffic patterns, diverse road infrastructures, and challenging weather conditions. Deep learning is pivotal in addressing these challenges, focusing on perception, decision-making, and control. Various architectures, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Deep Reinforcement Learning (DRL), are analyzed for their efficacy in autonomous driving. Sensor fusion techniques enhance perception capabilities, such as lidar, radar, and camera data integration. Additionally, advancements in semantic segmentation and object detection algorithms improve scene understanding and obstacle recognition. Collaborative efforts between academia, industry, and government agencies are essential for accelerating the deployment of autonomous driving technology in Indian road conditions. In conclusion, leveraging cutting-edge deep learning methodologies enables safe and efficient navigation of autonomous vehicles, ushering in a transformative future in transportation systems.
- Research Article
1
- 10.1007/s11263-022-01599-4
- Apr 28, 2022
- International Journal of Computer Vision
We introduce the first approach to solve the challenging problem of automatic 4D visual scene understanding for complex dynamic scenes with multiple interacting people from multi-view video. Our approach simultaneously estimates a detailed model that includes a per-pixel semantically and temporally coherent reconstruction, together with instance-level segmentation exploiting photo-consistency, semantic and motion information. We further leverage recent advances in 3D pose estimation to constrain the joint semantic instance segmentation and 4D temporally coherent reconstruction. This enables per person semantic instance segmentation of multiple interacting people in complex dynamic scenes. Extensive evaluation of the joint visual scene understanding framework against state-of-the-art methods on challenging indoor and outdoor sequences demonstrates a significant (approx 40%) improvement in semantic segmentation, reconstruction and scene flow accuracy. In addition to the evaluation on several indoor and outdoor scenes, the proposed joint 4D scene understanding framework is applied to challenging outdoor sports scenes in the wild captured with manually operated wide-baseline broadcast cameras.
- Conference Article
4
- 10.1109/icaccs51430.2021.9441924
- Mar 19, 2021
The semantic segmentation task one of a popular difficult problem in computer vision. This task takes great attention to the community of computer vision. Semantic segmentation segments the image which is input into meaningful semantically related regions in another expression, it predicates for each pixel inside image a class label, with appears of depth cameras it provides very useful organized rich information of data depth, RGB-D image not related with illumination when a mixed feature of RGB-D images with depth information features can very improve the accuracy of semantic segmentation. Recently, deep learning satisfied semantic segmentation more easily than traditional methods because of neural networks' power to automatically learn different representations of proper features. This paper survey provides a background for several techniques work on 2.D RGB-D dataset semantic segmentation in various applications on several different out and door free datasets.
- Research Article
9
- 10.1155/2022/6010912
- Mar 20, 2022
- Security and Communication Networks
Semantic segmentation is a significant research topic for decades and has been employed in several applications. In recent years, semantic segmentation has been focused on different deep learning approaches in the area of computer vision, which has aimed for getting superior efficiency while analyzing the aerial and remote-sensing images. The main aim of this review is to provide a clear algorithmic categorization and analysis of the diverse contribution of semantic segmentation of aerial images and expects to give the comprehensive details associated with the recent developments. In addition, the emerged deep learning methods demonstrated much improved performance measures on several public datasets and incredible efforts have been dedicated to advancing pixel-level accuracy. Hence, the analysis on diverse datasets of each contribution is studied, and also, the best performance measures achieved by the existing semantic segmentation models are evaluated. Thus, this survey can facilitate researchers in understanding the development of semantic segmentation in a shorter time, simplify understanding of its latest advancements, research gaps, and challenges to be used as a reference for developing the new semantic image segmentation models in the future.
- Research Article
1
- 10.3390/app122412711
- Dec 11, 2022
- Applied Sciences
Research on image classification sparked the latest deep-learning boom. Many downstream tasks, including semantic segmentation, benefit from it. The state-of-the-art semantic segmentation models are all based on deep learning, and they sometimes make some semantic mistakes. In a semantic segmentation dataset with a small number of categories, images are often collected from a single scene, and there is a close semantic connection between any two categories. However, in the semantic segmentation dataset collected from multiple scenes, two categories may be irrelevant. The probability of objects in one category appearing next to objects in other categories is different, which is the basis of the paper. Semantic segmentation methods need to solve two problems of positioning and classification. This paper is dedicated to correcting those clearly wrong classifications that are contrary to reality. Specifically, we first calculate the relevancy between different class pairs. Then, based on this knowledge, we infer the category of a connected component according to the relationships of this connected component with its surrounding connected components and correct the obviously wrong classifications made by a deep learning semantic segmentation model. Several well-performing deep learning models are experimented on two challenging public datasets in the field of semantic image segmentation. Our proposed method improves the performance of UPerNet, OCRNet and SETR from 40.7%, 43% and 48.64% to 42.07%, 44.09% and 49.09% mean IoU on the ADE20K validation set, and the performance of PSPNet, DeepLabV3 and OCRNet from 37.26%, 37.3% and 39.5% to 38.93%, 38.95% and 40.63% mean IoU on the COCO-Stuff dataset, which shows the effectiveness of the method.
- Research Article
8
- 10.3390/s22103898
- May 20, 2022
- Sensors
In this paper, we propose an activity detection system using a 24 × 32 resolution infrared array sensor placed on the ceiling. We first collect the data at different resolutions (i.e., 24 × 32, 12 × 16, and 6 × 8) and apply the advanced deep learning (DL) techniques of Super-Resolution (SR) and denoising to enhance the quality of the images. We then classify the images/sequences of images depending on the activities the subject is performing using a hybrid deep learning model combining a Convolutional Neural Network (CNN) and a Long Short-Term Memory (LSTM). We use data augmentation to improve the training of the neural networks by incorporating a wider variety of samples. The process of data augmentation is performed by a Conditional Generative Adversarial Network (CGAN). By enhancing the images using SR, removing the noise, and adding more training samples via data augmentation, our target is to improve the classification accuracy of the neural network. Through experiments, we show that employing these deep learning techniques to low-resolution noisy infrared images leads to a noticeable improvement in performance. The classification accuracy improved from 78.32% to 84.43% (for images with 6 × 8 resolution), and from 90.11% to 94.54% (for images with 12 × 16 resolution) when we used the CNN and CNN + LSTM networks, respectively.
- Conference Article
11
- 10.1109/iccv.2019.01052
- Oct 1, 2019
We introduce the first approach to solve the challenging problem of unsupervised 4D visual scene understanding for complex dynamic scenes with multiple interacting people from multi-view video. Our approach simultaneously estimates a detailed model that includes a per-pixel semantically and temporally coherent reconstruction, together with instance-level segmentation exploiting photo-consistency, semantic and motion information. We further leverage recent advances in 3D pose estimation to constrain the joint semantic instance segmentation and 4D temporally coherent reconstruction. This enables per person semantic instance segmentation of multiple interacting people in complex dynamic scenes. Extensive evaluation of the joint visual scene understanding framework against state-of-the-art methods on challenging indoor and outdoor sequences demonstrates a significant (approx 40%) improvement in semantic segmentation, reconstruction and scene flow accuracy.
- Book Chapter
- 10.1201/9781003277330-4
- May 23, 2022
The advancements in the methods and techniques in the field of computer vision have enabled numerous applications based on understanding and analysis of image data. Moreover, deep learning has brought a massive shift in image analysis, thereby attracting the attention of researchers worldwide. Many real-life application areas used the image segmentation techniques for the identification of different regions in an image and classify them into clusters depending upon the similarity. Many conventional techniques, namely thresh-olding, k-means clustering histogram-based segmentation, and edge detection algorithms were applied for the segmentation; these pre-existing methods were found to be less efficient because of human intervention. But with the turn-up of deep learning, it is considered as a predominant method in image processing. 50In today’s era where computer vision is contemplating image segmentation for various applications, it is of utmost importance to have a detailed review of it. In the past decade, image segmentation has evolved a lot and is defined on two levels of granularity, namely semantic segmentation and instance segmentation. Furthermore, semantic segmentation segments unknown objects or new objects and classify pixels which are semantically together. This approach can lay down the foundations for new models to improve prior existing computer vision methods. It is entrenched as a vigorous implement for the critical analysis of the different areas in given images. Firstly, we elucidate the basic terms and some mandatory concepts related to this particular field for a better understanding of the naive. Then, the chapter focuses on different methods and network structures which are applied in semantic image segmentation for deep analysis of images in different applications in contrast to conventional approaches. Also, we outline the strengths and weaknesses of this approach to present a superior perspective to the individuals. At last, we strive to disclose challenges of semantic image segmentation processes concurrence with deep learning and point out a set of promising future works.
- Research Article
- 10.32604/cmes.2023.025193
- Jan 1, 2023
- Computer Modeling in Engineering & Sciences
There are two types of methods for image segmentation. One is traditional image processing methods, which are sensitive to details and boundaries, yet fail to recognize semantic information. The other is deep learning methods, which can locate and identify different objects, but boundary identifications are not accurate enough. Both of them cannot generate entire segmentation information. In order to obtain accurate edge detection and semantic information, an Adaptive Boundary and Semantic Composite Segmentation method (ABSCS) is proposed. This method can precisely semantic segment individual objects in large-size aerial images with limited GPU performances. It includes adaptively dividing and modifying the aerial images with the proposed principles and methods, using the deep learning method to semantic segment and preprocess the small divided pieces, using three traditional methods to segment and preprocess original-size aerial images, adaptively selecting traditional results to modify the boundaries of individual objects in deep learning results, and combining the results of different objects. Individual object semantic segmentation experiments are conducted by using the AeroScapes dataset, and their results are analyzed qualitatively and quantitatively. The experimental results demonstrate that the proposed method can achieve more promising object boundaries than the original deep learning method. This work also demonstrates the advantages of the proposed method in applications of point cloud semantic segmentation and image inpainting.
- Research Article
- 10.4108/eetel.8433
- Feb 25, 2025
- EAI Endorsed Transactions on e-Learning
Semantic segmentation is a key research topic in the field of computer vision, aiming to assign each pixel to the corresponding category based on the semantic information in the image. This technology has significant application value in fields such as virtual reality and autonomous driving.With the rapid development of deep learning, particularly with the advent of FCN, image semantic segmentation has made substantial progress. Fully supervised learning, which trains deep learning models using labeled data, has demonstrated excellent performance in semantic segmentation tasks. This paper provides a comprehensive discussion and analysis of fully supervised semantic segmentation algorithms for 2D data in deep learning. First, it introduces the concept of semantic segmentation, its development, and its application scenarios. Next, it systematically reviews and categorizes current real-time semantic segmentation algorithms, analyzing the characteristics and limitations of each. Additionally, this paper presents a complete evaluation framework for real-time semantic segmentation, including relevant datasets and evaluation metrics. Based on this foundation, it identifies several challenges currently facing the field and suggests potential directions for future research. Through this summary and analysis, the paper aims to provide valuable insights for researchers conducting studies on image semantic segmentation.
- Research Article
- 10.3390/s25216576
- Oct 25, 2025
- Sensors (Basel, Switzerland)
Accurate segmentation of crops and weeds is essential for enhancing crop yield, optimizing herbicide usage, and mitigating environmental impacts. Traditional weed management practices, such as manual weeding or broad-spectrum herbicide application, are labor-intensive, environmentally harmful, and economically inefficient. In response, this study introduces a novel precision agriculture framework integrating Unmanned Aerial Vehicle (UAV)-based remote sensing with advanced deep learning techniques, combining Super-Resolution Reconstruction (SRR) and semantic segmentation. This study is the first to integrate UAV-based SRR and semantic segmentation for tobacco fields, systematically evaluate recent Transformer and Mamba-based models alongside traditional CNNs, and release an annotated dataset that not only ensures reproducibility but also provides a resource for the research community to develop and benchmark future models. Initially, SRR enhanced the resolution of low-quality UAV imagery, significantly improving detailed feature extraction. Subsequently, to identify the optimal segmentation model for the proposed framework, semantic segmentation models incorporating CNN, Transformer, and Mamba architectures were used to differentiate crops from weeds. Among evaluated SRR methods, RCAN achieved the optimal reconstruction performance, reaching a Peak Signal-to-Noise Ratio (PSNR) of 24.98 dB and a Structural Similarity Index (SSIM) of 69.48%. In semantic segmentation, the ensemble model integrating Transformer (DPT with DINOv2) and Mamba-based architectures achieved the highest mean Intersection over Union (mIoU) of 90.75%, demonstrating superior robustness across diverse field conditions. Additionally, comprehensive experiments quantified the impact of magnification factors, Gaussian blur, and Gaussian noise, identifying an optimal magnification factor of 4, proving that the method was robust to common environmental disturbances at optimal parameters. Overall, this research established an efficient, precise framework for crop cultivation management, offering valuable insights for precision agriculture and sustainable farming practices.
- Conference Article
1
- 10.1109/icces48766.2020.9137966
- Jun 1, 2020
Semantic image segmentation is an emerging task in the field of automation. Its application varies from autonomous driving to medical diagnosis. Semantic segmentation of an image means to label each pixel in that image to a particular class. As an example consider an outdoor street image where there are different objects like car, road, sky, trees, pedestrians etc. After applying semantic segmentation each pixel in the image belonging to the car will have the label car and road will have label road and so on. A recent trend in performing semantic segmentation is by using Convolutional Neural Networks, (CNN), which acted as a catalyst for segmentation. In this paper, a detailed discussion of various approaches for segmentation using CNN has been presented. Also, various datasets and their format and evaluations metrics are discussed. All the approaches discussed are diverse and has its pros and cons. Finally, an application-specific semantic segmentation method using a genetic CNN algorithm for classification task has been proposed. The proposed method has shown improvement in the M iou score when tested on the CamVid dataset and on a dataset created by combining two small object classification datasets, MNIST and CIFAR10.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.