# improved semantic representations from tree-structured long short-term memory networks

13 February 2017

notes on (Tai et al., 2015).

## summary

understanding: 7/10
code: https://github.com/stanfordnlp/treelstm

the tree-lstm paper. gives an original architecture in which lstms are not just sequential. hidden state is composed from arbitrarily many child units, not just previous. two suggested architectures:

• child-sum tree-lstm
• n-ary tree-lstm

## normal lstm

\begin{align} i_t &= \sigma\left(W^{(i)} x_t + U^{(i)} h_{t - 1} + b^{(i)}\right) \\
f_t &= \sigma\left(W^{(f)} x_t + U^{(f)} h_{t - 1} + b^{(f)}\right) \\
o_t &= \sigma\left(W^{(o)} x_t + U^{(o)} h_{t - 1} + b^{(o)}\right) \\
u_t &= \tanh\left(W^{(u)} x_t + U^{(u)} h_{t - 1} + b^{(u)}\right) \\
c_t &= i_t \odot u_t + f_t \odot c_{t - 1} \\
h_t &= o_t \odot \tanh(c_t) \end{align}

## child-sum tree-lstm

given a tree, let $$C(j)$$ denote the set of children of node $$j$$: \begin{align} \tilde h_j &= \sum_{k \in C(j)} h_k \\
i_j &= \sigma\left(W^{(i)} x_j + U^{(i)} \tilde h_j + b^{(i)}\right) \\
f_j &= \sigma\left(W^{(f)} x_j + U^{(f)} \tilde h_j + b^{(f)}\right) \\
o_j &= \sigma\left(W^{(o)} x_j + U^{(o)} \tilde h_j + b^{(o)}\right) \\
u_j &= \tanh\left(W^{(u)} x_j + U^{(u)} \tilde h_j + b^{(u)}\right) \\
c_j &= i_j \odot u_j + \sum_{k \in C(j)} f_{jk} \odot c_k \\
h_j &= o_j \odot \tanh(c_j) \end{align}

## n-ary tree-lstm

tree structure must have at most $$N$$ children. children are ordered. for any node $$j$$, index its $$k$$th child as $$jk$$. \begin{align} i_j &= \sigma\left(W^{(i)} x_j + \sum_{\ell = 1}^N U_{\ell}^{(i)} h_{j\ell} + b^{(i)}\right) \\
f_{jk} &= \sigma\left(W^{(f)} x_j + \sum_{\ell = 1}^N U_{k\ell}^{(f)} h_{j\ell} + b^{(f)}\right) \\
o_j &= \sigma\left(W^{(o)} x_j + \sum_{\ell = 1}^N U_{\ell}^{(o)} h_{j\ell} + b^{(o)}\right) \\
u_j &= \tanh\left(W^{(u)} x_j + \sum_{\ell = 1}^N U_{\ell}^{(u)} h_{j\ell} + b^{(u)}\right) \\
c_j &= i_j \odot u_j + \sum_{\ell = 1}^N f_{j\ell} \odot c_{j\ell} \\
h_j &= o_j \odot \tanh(c_j) \end{align}

## experiments

didn’t look into too much detail but there were two experiments:

• tree-lstm classification
• goal: classify a subset of nodes in a tree. e.g.: label for a node in a parse tree corresponds to property of pharse spanned by that node.
• semantic relatedness of sentence pairs
• goal: given two sentences, output similarity in [1, K]; K = very similar; 1 = very dissimilar.

## discussion

hypotheses:

• structure mitigates problem of preserving state over long sequences of words.
• tree-lstms are able to encode semantically-useful structural information in the sentence representations that they compose.

1. Tai, K. S., Socher, R., & Manning, C. D. (2015). Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. Association for Computational Linguistics (ACL).
@inproceedings{tai2015improved,
title = {Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks},
author = {Tai, Kai Sheng and Socher, Richard and Manning, Christopher D.},
booktitle = {Association for Computational Linguistics (ACL)},
year = {2015}
}


[back]