Extracting high-order abstract patterns from complex high-dimensional data forms the foundation of human cognitive abilities. visual reasoning involves identifying abstract patterns embedded within composite images, considered a core competency of machine intelligence. Traditional neuro-symbolic methods often infer unknown objects through data fitting, without fully exploring the abstract patterns within composite images and the sequential sensitivity of visual sequences. This paper constructs a relation model with object-centric inductive biases, learning end-to-end multi-granular rule embeddings at different levels. Through a gating fusion module, the model incrementally integrates explicit representations of objects and abstract relationships. The model incorporates a relational bottleneck method from information theory, separating the input perceptual information from the embeddings of abstract representations, thereby restricting and differentiating feature processing to encourage relational comparisons and induce the extraction of abstract patterns. Furthermore, this paper bridges algebraic operations and machine reasoning through the relational bottleneck method, extracting common patterns of multi-visual objects by identifying invariant sequences within the relational bottleneck matrix. Experimental results on the I-RAVEN dataset demonstrate a total accuracy of 96.8%, surpassing state-of-the-art baseline methods and exceeding human performance at 84.4%.
Read full abstract