Abstract

Self-attention is a permutation-equivariant operation by definition, and incorporating group-invariant positional encoding allows it to achieve equivariance for larger symmetry groups. We propose efficient group-equivariant transformers that eliminate the cubic complexity caused by group-invariant positional encoding. Instead of using explicit positional encoding, we replace linear projections with group-equivariant convolutional projections, which serve as both a group mixer and an implicit positional encoding. This allows for reducing the group of self-attention to the translation group while maintaining group equivariance, resulting in less computation and memory. A group-equivariant convolutional stem improves performance much more. The proposed method outperforms the existing group-equivariant transformer and CNNs on the rotated MNIST dataset, a standard dataset for evaluating group-equivariant networks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call