Abstract

This paper introduces TrashInst, an innovative fully convolutional single-stage anchor-free real-time instance segmentation model designed specifically for the detection of floating litter in waterways. TrashInst features a streamlined Encoder-Decoder architecture and an efficient channel attention mask head, ensuring the preservation of critical feature maps while minimizing redundant computations. Even in scenarios with limited feature maps, our approach, complemented by sparse instance activation maps, excels in both detection and segmentation tasks. Our encoder effectively processes features derived from ResNet50 across multiple scales, generating comprehensive abstractions through the nested feature module. Subsequently, the Vortex Instance Activation Maps module (VIAM) aggregates these multi-scale nested features and integrates them with mask head outputs, yielding precise instance masks. Furthermore, we employ the focal Tversky objective function to balance the dataset. In terms of performance, our approach outperforms state-of-the-art real-time instance segmentation models, achieving a remarkable 34% accuracy improvement, all while maintaining swift execution at 44 frames per second (FPS) on a single NVIDIA GTX-3090 GPU. Rigorous validation underscores the delicate interplay between speed and precision, highlighting the model’s exceptional ability to handle objects of varying sizes in real-time settings. Significantly, TrashInst attains superior average precision (AP) results in detecting large to medium-sized objects, outperforming existing models by a notable 4∼22% in the large category and an impressive 1.1∼56% in the medium category within our dataset. For the benefit of the community data will be available at (https://github.com/nassim12/TrashInst).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call