Reinforcement-based Program Induction in a Neural Virtual Machine

Garrett E. Katz, Khushboo Gupta, James A. Reggia

Research output: Chapter in Book/Entry/PoemConference contribution

2 Scopus citations


We present a neural virtual machine that can be trained to perform algorithmic tasks. Rather than combining a neural controller with non-neural memory storage as has been done in the past, this architecture is purely neural and emulates tape-based memory via fast associative weights (one-step learning). Here we formally define the architecture, and then extend the system to learn programs using recurrent policy gradient reinforcement learning based on examples of program inputs labeled with corresponding output targets, which are compared against actual output to generate a sparse reward signal. We describe the policy gradient training procedure used, and report its empirical performance on a number of small-scale list processing tasks, such as finding the maximum list element, filtering out certain elements, and reversing the order of the elements. These results show that program induction via reinforcement learning is possible using sparse rewards and solely neural computations.

Original languageEnglish (US)
Title of host publication2020 International Joint Conference on Neural Networks, IJCNN 2020 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728169262
StatePublished - Jul 2020
Event2020 International Joint Conference on Neural Networks, IJCNN 2020 - Virtual, Glasgow, United Kingdom
Duration: Jul 19 2020Jul 24 2020

Publication series

NameProceedings of the International Joint Conference on Neural Networks


Conference2020 International Joint Conference on Neural Networks, IJCNN 2020
Country/TerritoryUnited Kingdom
CityVirtual, Glasgow


  • Fast Weights
  • Neural Networks
  • Policy Gradient
  • Program Induction

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence


Dive into the research topics of 'Reinforcement-based Program Induction in a Neural Virtual Machine'. Together they form a unique fingerprint.

Cite this