Reinforcement-based Program Induction in a Neural Virtual Machine

Garrett E. Katz, Khushboo Gupta, James A. Reggia

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a neural virtual machine that can be trained to perform algorithmic tasks. Rather than combining a neural controller with non-neural memory storage as has been done in the past, this architecture is purely neural and emulates tape-based memory via fast associative weights (one-step learning). Here we formally define the architecture, and then extend the system to learn programs using recurrent policy gradient reinforcement learning based on examples of program inputs labeled with corresponding output targets, which are compared against actual output to generate a sparse reward signal. We describe the policy gradient training procedure used, and report its empirical performance on a number of small-scale list processing tasks, such as finding the maximum list element, filtering out certain elements, and reversing the order of the elements. These results show that program induction via reinforcement learning is possible using sparse rewards and solely neural computations.

Original languageEnglish (US)
Title of host publication2020 International Joint Conference on Neural Networks, IJCNN 2020 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728169262
DOIs
StatePublished - Jul 2020
Event2020 International Joint Conference on Neural Networks, IJCNN 2020 - Virtual, Glasgow, United Kingdom
Duration: Jul 19 2020Jul 24 2020

Publication series

NameProceedings of the International Joint Conference on Neural Networks

Conference

Conference2020 International Joint Conference on Neural Networks, IJCNN 2020
CountryUnited Kingdom
CityVirtual, Glasgow
Period7/19/207/24/20

Keywords

  • Fast Weights
  • Neural Networks
  • Policy Gradient
  • Program Induction

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Reinforcement-based Program Induction in a Neural Virtual Machine'. Together they form a unique fingerprint.

Cite this