An information theoretic score for learning hierarchical concepts.

Omid Madani

doi:10.3389/fncom.2023.1082502

Abstract

How do humans learn the regularities of their complex noisy world in a robust manner? There is ample evidence that much of this learning and development occurs in an unsupervised fashion via interactions with the environment. Both the structure of the world as well as the brain appear hierarchical in a number of ways, and structured hierarchical representations offer potential benefits for efficient learning and organization of knowledge, such as concepts (patterns) sharing parts (subpatterns), and for providing a foundation for symbolic computation and language. A major question arises: what drives the processes behind acquiring such hierarchical spatiotemporal concepts? We posit that the goal of advancing one's predictions is a major driver for learning such hierarchies and introduce an information-theoretic score that shows promise in guiding the processes, and, in particular, motivating the learner to build larger concepts. We have been exploring the challenges of building an integrated learning and developing system within the framework of prediction games, wherein concepts serve as (1) predictors, (2) targets of prediction, and (3) building blocks for future higher-level concepts. Our current implementation works on raw text: it begins at a low level, such as characters, which are the hardwired or primitive concepts, and grows its vocabulary of networked hierarchical concepts over time. Concepts are strings or n-grams in our current realization, but we hope to relax this limitation, e.g., to a larger subclass of finite automata. After an overview of the current system, we focus on the score, named CORE. CORE is based on comparing the prediction performance of the system with a simple baseline system that is limited to predicting with the primitives. CORE incorporates a tradeoff between how strongly a concept is predicted (or how well it fits its context, i.e., nearby predicted concepts) vs. how well it matches the (ground) "reality," i.e., the lowest level observations (the characters in the input episode). CORE is applicable to generative models such as probabilistic finite state machines (beyond strings). We highlight a few properties of CORE with examples. The learning is scalable and open-ended. For instance, thousands of concepts are learned after hundreds of thousands of episodes. We give examples of what is learned, and we also empirically compare with transformer neural networks and n-gram language models to situate the current implementation with respect to state-of-the-art and to further illustrate the similarities and differences with existing techniques. We touch on a variety of challenges and promising future directions in advancing the approach, in particular, the challenge of learning concepts with a more sophisticated structure.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An information theoretic score for learning hierarchical concepts.

Abstract

Talk to us

Similar Papers

More From: Frontiers in Computational Neuroscience

Lead the way for us

Journal: Frontiers in Computational Neuroscience	Publication Date: May 2, 2023
License type: CC BY 4.0

Similar Papers

L2 Learners' Discourse and SLA Theories in CMC: Collaborative Interaction in Internet Chat
Keiko Kitade
Computer Assisted Language Learning | VOL. 13
Keiko KitadeKeiko Kitade
01 Apr 2000
Computer Assisted Language Learning | VOL. 13

Off-Policy Evaluation of the Performance of a Robot Swarm: Importance Sampling to Assess Potential Modifications to the Finite-State Machine That Controls the Robots.
Federico Pagnozzi ... Mauro Birattari
Frontiers in Robotics and AI | VOL. 8
Federico Pagnozzi, et. al.Federico Pagnozzi ... Mauro Birattari
29 Apr 2021
Frontiers in Robotics and AI | VOL. 8

Net gain via knowledge organization: Classification and productivity
Jane Greenberg ... Gail Hodge
Proceedings of the American Society for Information Science and Technology | VOL. 50
Jane Greenberg, et. al.Jane Greenberg ... Gail Hodge
01 Jan 2013
Proceedings of the American Society for Information Science and Technology | VOL. 50

Knowledge Maps: an Online Tool for Knowledge Mapping with Automated Feedback.
Veronica W. Ho ... Gary M. Velan
Medical science educator | VOL. 29
Veronica W. Ho, et. al.Veronica W. Ho ... Gary M. Velan
08 May 2019
Medical science educator | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An information theoretic score for learning hierarchical concepts.

Abstract

Talk to us

Similar Papers

More From: Frontiers in Computational Neuroscience