Semantic segmentation is a basic and long-standing research area. Depth images can enrich RGB (red-green-blue) images with their rich geometric information, so as to achieve accurate semantic segmentation. However, redundant information exists in RGB and depth images, and its handling has become an important problem. Filter group convolutions are widely used because they can eliminate redundant information and reduce computational complexity and parameter cost. Similarly, we propose a feature grouping mechanism network (FGMNet) using an attention mechanism and contextual information extraction for indoor scene semantic segmentation. First, modules of pyramid feature grouping attention and feature augmentation highlight the most useful information obtained by combining RGB and depth features. The enhanced features are then fed into a feature grouping contextual module. Results from extensive experiments on well-known indoor scene semantic segmentation datasets, NYUDv2 and SUN RGB-D, indicate that our FGMNet outperforms the most advanced existing methods in RGB-D semantic segmentation.
Read full abstract