The progress of 3D instance segmentation techniques has made it essential for several applications, such as augmented reality, autonomous driving, and robotics. Traditional methods usually have challenges with complex indoor scenes made of multiple objects with different occlusions and orientations. In this work, the authors present an innovative model that integrates a new adaptive n-shifted shuffle (ANSS) attention mechanism with the Generalized Hough Transform (GHT) for robust 3D instance segmentation of indoor scenes. The proposed technique leverages the n-shifted sigmoid activation function, which improves the adaptive shuffle attention mechanism, permitting the network to dynamically focus on relevant features across various regions. A learnable shuffling pattern is produced through the proposed ANSS attention mechanism to spatially rearrange the relevant features, thus augmenting the model's ability to capture the object boundaries and their fine-grained details. The integration of GHT furnishes a vigorous framework to localize and detect objects in the 3D space, even when heavy noise and partial occlusions are present. The authors evaluate the proposed method on the challenging Stanford 3D Indoor Spaces Dataset (S3DIS), where it establishes its superiority over existing methods. The proposed approach achieves state-of-the-art performance in both mean Intersection over Union (IoU) and overall accuracy, showcasing its potential for practical deployment in real-world scenarios. These results illustrate that the integration of the ANSS and the GHT yields a robust solution for 3D instance segmentation tasks.
Read full abstract