Abstract

This article does not describe a working system. Instead, it presents a single idea about representation that allows advances made by several different groups to be combined into an imaginary system called GLOM.1 The advances include transformers, neural fields, contrastive representation learning, distillation, and capsules. GLOM answers the question: How can a neural network with a fixed architecture parse an image into a part-whole hierarchy that has a different structure for each image? The idea is simply to use islands of identical vectors to represent the nodes in the parse tree. If GLOM can be made to work, it should significantly improve the interpretability of the representations produced by transformer-like systems when applied to vision or language.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.