We present a neural virtual machine that can be trained to perform algorithmic tasks. Rather than combining a neural controller with non-neural memory storage as has been done in the past, this architecture is purely neural and emulates tape-based memory via fast associative weights (one-step learning). Here we formally define the architecture, and then extend the system to learn programs using recurrent policy gradient reinforcement learning based on examples of program inputs labeled with corresponding output targets, which are compared against actual output to generate a sparse reward signal. We describe the policy gradient training procedure used, and report its empirical performance on a number of small-scale list processing tasks, such as finding the maximum list element, filtering out certain elements, and reversing the order of the elements. These results show that program induction via reinforcement learning is possible using sparse rewards and solely neural computations.