Abstract

The sample-inefficiency problem in Artificial Intelligence refers to the inability of current Deep Reinforcement Learning models to optimize action policies within a small number of episodes. Recent studies have tried to overcome this limitation by adding memory systems and architectural biases to improve learning speed, such as in Episodic Reinforcement Learning. However, despite achieving incremental improvements, their performance is still not comparable to how humans learn behavioral policies. In this paper, we capitalize on the design principles of the Distributed Adaptive Control (DAC) theory of mind and brain to build a novel cognitive architecture (DAC-ML) that, by incorporating a hippocampus-inspired sequential memory system, can rapidly converge to effective action policies that maximize reward acquisition in a challenging foraging task.

Highlights

  • With the advent of Deep Reinforcement Learning (DRL), the last decade has been marked by several historical landmarks in the field of Artificial Intelligence (AI), in terms of both scientific and societal impact

  • We have presented DAC-Machine Learning (ML), a novel cognitive architecture based on the organizational principles of the Distributed Adaptive Control theory, that can efficiently maximize reward acquisition in a challenging foraging task

  • In contrast with classical RL where the policy is directly updated by the learning algorithm, in DAC-ML, policy learning is achieved through the interaction of the different components of its layered architecture

Read more

Summary

Abstract Abstract

The sample-inefficiency problem in Artificial Intelligence refers to the inability of current Deep Reinforcement Learning models tTohoepstaimipzlee-aicnteifofnicpieonliccyiepsrowbitlhemin iansAmratlilfincuiaml bInetreollfigeepnicsoedresf.erRsetcoetnhtestiundabieislithyavoef ctruirerdentot DoveerpcoRmeienftohriscelmimeintattLioenarbnyinagddminogdels mtoeompotirmyiszyesatecmtiosnanpdolaicricehsitwecithuirnalabsimasaelsl tnouimmbperrovoef elepaisrnoidnegs.sRpeecde,nstusctuhdaiessinhaEvpeistorideidc tRoeoinvfeorcrcoemmeetnhtisLleiamrnitiantgio.nHboywaedvdeirn, g dmeesmpioteryacsyhsietevminsgaindcraermcheintteacltuimraplrboivaesmesetnotsi,mthperoirvpeelrefaorrnminagncsepeiseds,tisllucnhotacsoimn pEapriasboldeictoRheoinwfohrucmemanesntleLaeranrbneinhga.vHioorawlepvoelri,cies. Idnestphitsepaacpheire,vwinegcianpcirteamlizeentoanl itmhepdroevsiegmnepnrtisn,ctihpelierspoefrftohremDainsctreibisutsetdillAndoatpctoivmepCaoranbtlreolto(DhAowC)htuhmeoarnys olefamrninbdehaanvdiobralinpotolibciueisl.d aInntohviselpcaopgenr,itwiveecaarpcihtiatleizcetuoren (tDheAdCe-sMigLn)ptrhinact,ipblyesinocfotrhpeorDaitsintrgibauhteidppAodcamptpivues-Cinosnptirroeld(DseAquCe)ntthiaelomryeomfomryinsdysatnedmb, rcaain rtoapbiudliyld caonnovveerlgceotgoneitfifveectaivrcehaitceticotunrpeo(lDicAieCs-tMhaLt )mthaxati,mbiyzeinrceowraprodraatciqnugiasihtiiopnpoincaamcphualsl-eingspinirgedfosreaqguinegnttiaaslkm. Emory system, can rapidly converge to effective action policies that maximize reward acquisition in a challenging foraging task

Introduction
Cognitive architecture
Results
Discussion and Future
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.