Although considerable progress has been achieved regarding the transformers in recent years, the large number of parameters, quadratic computational complexity, and memory cost conditioned on long sequences make the transformers hard to train and implement, especially in edge computing configurations. In this case, a dizzying number of works have sought to make improvements around computational and memory efficiency upon the original transformer architecture. Nevertheless, many of them restrict the context in the attention to seek a trade-off between cost and performance with prior knowledge of orderly stored data. It is imperative to dig deep into an efficient feature extractor for point clouds due to their irregularity and a large number of points. In this paper, we propose a novel skeleton decomposition-based self-attention (SD-SA) which has no sequence length limit and exhibits favorable scalability in long-sequence models. Due to the numerical low-rank nature of self-attention, we approximate it by the skeleton decomposition method while maintaining its effectiveness. At this point, we have shown that the proposed method works for the proposed approach on point cloud classification, segmentation, and detection tasks on the ModelNet40, ShapeNet, and KITTI datasets, respectively. Our approach significantly improves the efficiency of the point cloud transformer and exceeds other efficient transformers on point cloud tasks in terms of the speed at comparable performance.
Read full abstract