making neural programming architectures generalize via recursion

10 February 2017

notes on (Cai et al., 2017).

summary

understanding: 8/10
code: N/A

pretty cool idea about making neural nets learn recursive programs by changing the training traces for neural programmer interpreter (npi) architecture. can prove perfect generalization.

npi

first, how does npi work? the main architecture is a lstm with things sticking in and out.

input to the lstm unit is (e, p, a)

environment e
program p
program arguments a

output from the lstm unit is (r, p2, a2)

return probability r
next program p2
next program arguments a2

this architecture can make the neural net choose the inputs and arguments that drives the program in the right direction (similar to inference compilation). the program itself doesn’t need to be differentiable.

changes from npi

use different traces! for example, in addition, instead of doing add = (add1, lshift, add1, lshift, add1, lshift, …), do add = (add1, lshift, add)
when a program is called recursively, reset the lstm state.

experiments

grade-school addition
bubble sort
topological sort
quicksort all of these obtain perfect generalization.

proofs

just need to prove correctness on base cases and reduction rules. proven for the 4 examples.

references

Cai, J., Shin, R., & Song, D. (2017). Making Neural Programming Architectures Generalize via Recursion. International Conference on Learning Representations (ICLR).

@inproceedings{cai2017making,
  title = {Making Neural Programming Architectures Generalize via Recursion},
  author = {Cai, Jonathon and Shin, Richard and Song, Dawn},
  year = {2017},
  booktitle = {International Conference on Learning Representations (ICLR)}
}

[back]