Sample-efficient Learning Research Articles

Converging theories suggest that organisms learn and exploit probabilistic models of their environment. However, it remains unclear how such models can be learned in practice. The open-ended complexity of natural environments means that it is generally infeasible for organisms to model their environment comprehensively. Alternatively, action-oriented models attempt to encode a parsimonious representation of adaptive agent-environment interactions. One approach to learning action-oriented models is to learn online in the presence of goal-directed behaviours. This constrains an agent to behaviourally relevant trajectories, reducing the diversity of the data a model need account for. Unfortunately, this approach can cause models to prematurely converge to sub-optimal solutions, through a process we refer to as a bad-bootstrap. Here, we exploit the normative framework of active inference to show that efficient action-oriented models can be learned by balancing goal-oriented and epistemic (information-seeking) behaviours in a principled manner. We illustrate our approach using a simple agent-based model of bacterial chemotaxis. We first demonstrate that learning via goal-directed behaviour indeed constrains models to behaviorally relevant aspects of the environment, but that this approach is prone to sub-optimal convergence. We then demonstrate that epistemic behaviours facilitate the construction of accurate and comprehensive models, but that these models are not tailored to any specific behavioural niche and are therefore less efficient in their use of data. Finally, we show that active inference agents learn models that are parsimonious, tailored to action, and which avoid bad bootstraps and sub-optimal convergence. Critically, our results indicate that models learned through active inference can support adaptive behaviour in spite of, and indeed because of, their departure from veridical representations of the environment. Our approach provides a principled method for learning adaptive models from limited interactions with an environment, highlighting a route to sample efficient learning algorithms.

Read full abstract

With the recent advances of multimodal interactive recommendations, the users are able to express their preference by natural language feedback to the item images, to find the desired items. However, the existing systems either retrieve only one item or require the user to specify (e.g., by click or touch) the commented items from a list of recommendations in each user interaction. As a result, the users are not hands-free and the recommendations may be impractical. We propose a hands-free visual dialog recommender system to interactively recommend a list of items. At each time, the system shows a list of items with visual appearance. The user can comment on the list in natural language, to describe the desired features they further want. With these multimodal data, the system chooses another list of items to recommend. To understand the user preference from these multimodal data, we develop neural network models which identify the described items among the list and further predict the desired attributes. To achieve efficient interactive recommendations, we leverage the inferred user preference and further develop a novel bandit algorithm. Specifically, to avoid the system exploring more than needed, the desired attributes are utilized to reduce the exploration space. More importantly, to achieve sample efficient learning in this hands-free setting, we derive additional samples from the user's relative preference expressed in natural language and design a pairwise logistic loss in bandit learning. Our bandit model is jointly updated by the pairwise logistic loss on the additional samples derived from natural language feedback and the traditional logistic loss. The empirical results show that the probability of finding the desired items by our system is about 3 times as high as that by the traditional interactive recommenders, after a few user interactions.

Read full abstract

Sample-efficient Learning Research Articles

Related Topics

Articles published on Sample-efficient Learning

Learning action-oriented models through active inference.

On the Role of Weight Sharing During Deep Option Learning

Towards Hands-Free Visual Dialog Interactive Recommendation

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

Adaptive Tuning Curve Widths Improve Sample Efficient Learning.

Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management

Learning to Learn: Hierarchical Meta-Critic Networks

Sample Efficient Learning of Path Following and Obstacle Avoidance Behavior for Quadrotors

Autonomous exploration of motor skills by skill babbling

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Sample-efficient Learning Research Articles

Related Topics

Articles published on Sample-efficient Learning

Learning action-oriented models through active inference.

On the Role of Weight Sharing During Deep Option Learning

Towards Hands-Free Visual Dialog Interactive Recommendation

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

Adaptive Tuning Curve Widths Improve Sample Efficient Learning.

Distributed Structured Actor-Critic Reinforcement Learning for Universal Dialogue Management

Learning to Learn: Hierarchical Meta-Critic Networks

Sample Efficient Learning of Path Following and Obstacle Avoidance Behavior for Quadrotors

Autonomous exploration of motor skills by skill babbling