Two Views of Sequential Monte Carlo

18 November 2022

This note is a result of discussions with Feras Saad in 2022 when he was a visiting researcher in Google.

Sequential Monte Carlo (SMC) is a general framework using which we can understand many inference algorithms such as particle filtering, resample-move versions of particle filtering, annealed importance sampling and more. Despite its simplicity, SMC is applicable to a wide class of models—universal probabilistic programs.

There are at least two ways to think about and formalize a general recipe for SMC. One based on a sequence increasing spaces (Doucet and Johansen, 2008; Section 3.5) and the other based on “SMC samplers” (SMCS) (Del Moral et al., 2006). They are equivalent in that we can derive one from the other and vice versa. While the increasing-spaces formulation is more intuitive as a generalization of particle filtering, the SMCS formulation exposes more flexibility for inference design. For a long time, this was not obvious to me, mainly due to the fact that the standard proof of correctness of the SMCS formulation relies on viewing it as a special case of the increasing-spaces formulation (see below).

Increasing-spaces formulation

We have a sequence of unnormalized target densities \(\{\gamma_t(z_{1:t})\}_{t = 1}^T\) with normalizing constants \(Z_t = \int \gamma_t(z_{1:t}) \,\mathrm dz_{1:t}\). We want to estimate the normalizing constants \(Z_t\) and expectations with respect to the (normalized) target distributions \(\pi_t(z_{1:t}) = \gamma_t(z_{1:t}) / Z_t\). To do this, we design a sequence of proposal distributions \(\{q_t(z_t \given z_{1:t - 1})\}_{t = 1}^T\).

Assuming a particle budget \(K\) and resampling at every step, the increasing-spaces SMC procedure is

Sample \(z_1^k \sim q_1(z_1)\)
Weigh \(w_1^k = \frac{\gamma_1(z_1^k)}{q_1(z_1^k)}\)
For \(t = 2, \dotsc, T\)
- Sample ancestral indices \(a_{t - 1}^{1:K}\) based on weights \(w_{t - 1}^{1:K}\)
- Set \(z_{1:t - 1}^k \leftarrow z_{1:t - 1}^{a_{t - 1}^k}\)
- Propose \(z_t^k \sim q_t(z_t \given z_{1:t - 1}^k)\)
- Weigh \(w_t^k = \frac{\gamma_t(z_{1:t}^k)}{\gamma_{t - 1}(z_{1:t - 1}^k) q_t(z_t^k \given z_{1:t - 1}^k)}\)

At every timestep, we can estimate the normalizing constant and expectations as \begin{align} Z_t &\approx \prod_{\tau = 1}^t \left(\frac{1}{K} \sum_{k = 1}^K w_\tau^k \right), \\
\E_{\pi_t(z_{1:t})}[f(z_{1:t})] &\approx \frac{1}{\sum_{k = 1}^K w_t^k} \sum_{k = 1}^K w_t^k f(z_{1:t}^k). \end{align}

SMCS formulation

We have a sequence of unnormalized target densities \(\{\gamma_t(z_t)\}_{t = 1}^T\) with normalizing constants \(Z_t = \int \gamma_t(z_t) \,\mathrm dz_t\). We want to estimate the normalizing constants \(Z_t\) and expectations with respect to the (normalized) target distributions \(\pi_t(z_t) = \gamma_t(z_t) / Z_t\). To do this, we design a sequence of forward kernels \(\{q_t(z_t \given z_{t - 1})\}_{t = 1}^T\) and a sequence of reverse kernels \(\{r_{t - 1}(z_{t - 1} \given z_t\}_{t = 2}^T\).

Assuming a particle budget \(K\) and resampling at every step, the SMCS procedure is

Sample \(z_1^k \sim q_1(z_1)\)
Weigh \(w_1^k = \frac{\gamma_1(z_1^k)}{q_1(z_1^k)}\)
For \(t = 2, \dotsc, T\)
- Sample ancestral indices \(a_{t - 1}^{1:K}\) based on weights \(w_{t - 1}^{1:K}\)
- Set \(z_{t - 1}^k \leftarrow z_{t - 1}^{a_{t - 1}^k}\)
- Propose \(z_t^k \sim q_t(z_t \given z_{t - 1}^k)\)
- Weigh \(w_t^k = \frac{\gamma_t(z_t^k) r_{t - 1}(z_{t - 1}^k \given z_t^k)}{\gamma_{t - 1}(z_{t - 1}^k) q_t(z_t^k \given z_{t - 1}^k)}\)

At every timestep, we can estimate the normalizing constant and expectations as \begin{align} Z_t &\approx \prod_{\tau = 1}^t \left(\frac{1}{K} \sum_{k = 1}^K w_\tau^k \right), \\
\E_{\pi_t(z_t)}[f(z_t)] &\approx \frac{1}{\sum_{k = 1}^K w_k} \sum_{k = 1}^K w_k f(z_t^k). \end{align}

SMCS formulation as a special case of the increasing-spaces formulation

The standard way to justify the SMCS formulation is to view it as a special case of the increasing-spaces formulation. To do this, we set \begin{align} \gamma_t^{\text{IS}}(z_{1:t}) &= \gamma_t^{\text{SMCS}}(z_t) \prod_{\tau = 1}^{t - 1} r_\tau(z_\tau \given z_{\tau + 1}) \\
q_t^{\text{IS}}(z_t \given z_{1:t - 1}) &= q_t^{\text{SMCS}}(z_t \given z_{t - 1}) \end{align} which implies \begin{align} Z_t^{\text{IS}} &= \int \gamma_t^{\text{IS}}(z_{1:t}) \,\mathrm dz_{1:t} \\
&= \int \gamma_t^{\text{SMCS}}(z_t) \prod_{\tau = 1}^{t - 1} r_\tau(z_\tau \given z_{\tau + 1}) \,\mathrm dz_{1:t} \\
&= \int \gamma_t^{\text{SMCS}}(z_t) \,\mathrm dz_t \\
&= Z_t^{\text{SMCS}}, \\
w_t^{\text{IS}} &=\frac{\gamma_t^{\text{IS}}(z_{1:t})}{\gamma_{t - 1}^{\text{IS}}(z_{1:t - 1}) q_t^{\text{IS}}(z_t \given z_{1:t - 1})} \\
&= \frac{\gamma_t^{\text{SMCS}}(z_t) \prod_{\tau = 1}^{t - 1} r_\tau(z_\tau \given z_{\tau + 1})}{\left(\gamma_{t - 1}^{\text{SMCS}}(z_{t - 1}) \prod_{\tau = 1}^{t - 2} r_\tau(z_\tau \given z_{\tau + 1}) \right) q_t^{\text{SMCS}}(z_t \given z_{t - 1})} \\
&= \frac{\gamma_t^{\text{SMCS}}(z_t) r_{t - 1}(z_{t - 1} \given z_t)}{\gamma_{t - 1}^{\text{SMCS}}(z_{t - 1}) q_t^{\text{SMCS}}(z_t \given z_{t - 1})} \\
&= w_t^{\text{SMCS}}. \end{align}

Increasing-spaces formulation as a special case of SMCS formulation

To view the increasing-spaces formulation as a special case of the SMCS formulation, we need to view a sequence \(z_{1:t}\) as a different object from \(z_{1:t - 1}\) rather than as a result of concatenating \(z_t\). We will distinguish \(z_{1:t}^{(t)}\) from \(z_{1:t - 1}^{(t - 1)}\) using the “\((t)\)”/”\((t - 1)\)” superscript. Setting \begin{align} z_t^{\text{SMCS}} &= z_{1:t}^{(t), \text{IS}} \\
\gamma_t^{\text{SMCS}}(z_t^{\text{SMCS}}) &= \gamma_t^{\text{IS}}(z_{1:t}^{(t), \text{IS}}) \end{align} implies \begin{align} Z_t^{\text{SMCS}} &= \int \gamma_t^{\text{SMCS}}(z_t^{\text{SMCS}}) \,\mathrm dz_t^{\text{SMCS}} \\
&= \int \gamma_t^{\text{IS}}(z_{1:t}^{(t), \text{IS}}) \,\mathrm dz_{1:t}^{(t), \text{IS}} \\
&= Z_t^{\text{IS}}. \end{align}

To enforce \(z_{1:t - 1}^{(t), \text{IS}} = z_{1:t - 1}^{(t - 1), \text{IS}}\), we include copy kernels in both forward and reverse kernels: \begin{align} q_t^{\text{SMCS}}(z_t^{\text{SMCS}} | z_{t - 1}^{\text{SMCS}}) &= \delta_{z_{1:t - 1}^{(t - 1), \text{IS}}}(z_{1:t - 1}^{(t), \text{IS}}) q_t^{\text{IS}}(z_t^{(t), \text{IS}} | z_{1:t - 1}^{(t - 1), \text{IS}}), \\
r_{t - 1}^{\text{SMCS}}(z_{t - 1}^{\text{SMCS}} | z_t^{\text{SMCS}}) &= \delta_{z_{1:t - 1}^{(t), \text{IS}}}(z_{1:t - 1}^{(t - 1), \text{IS}}). \end{align} The “propose” step in SMCS will therefore only propose \(z_t^{(t), \text{IS}}\) given \(z_{1:t - 1}^{(t - 1), \text{IS}}\) as is done in the “propose” step in the increasing-spaces formulation. The weights also match: \begin{align} w_t^{\text{SMCS}} &= \frac{\gamma_t^{\text{SMCS}}(z_t^{\text{SMCS}}) r_{t - 1}^{\text{SMCS}}(z_{t - 1}^{\text{SMCS}} | z_t^{\text{SMCS}})}{\gamma_{t - 1}^{\text{SMCS}}(z_{t - 1}^{\text{SMCS}}) q_t^{\text{SMCS}}(z_t^{\text{SMCS}} | z_{t - 1}^{\text{SMCS}})} \\
&= \frac{\gamma_t^{\text{IS}}(z_{1:t}^{(t), \text{IS}}) \delta_{z_{1:t - 1}^{(t), \text{IS}}}(z_{1:t - 1}^{(t - 1), \text{IS}})}{\gamma_{t - 1}^{\text{IS}}(z_{1:t - 1}^{(t - 1), \text{IS}}) \delta_{z_{1:t - 1}^{(t - 1), \text{IS}}}(z_{1:t - 1}^{(t), \text{IS}}) q_t^{\text{IS}}(z_t^{(t), \text{IS}} | z_{1:t - 1}^{(t - 1), \text{IS}})} \\
&= \frac{\gamma_t^{\text{IS}}(z_{1:t}^{(t), \text{IS}})}{\gamma_{t - 1}^{\text{IS}}(z_{1:t - 1}^{(t - 1), \text{IS}}) q_t^{\text{IS}}(z_t^{(t), \text{IS}} | z_{1:t - 1}^{(t - 1), \text{IS}})} \\
&= w_t^{\text{IS}}. \end{align}

Tradeoffs of the two formulations

Mathematically, these formulations are equivalent. In practice, the SMCS formulation makes it easier to think about the design of a wider range of algorithms. This is because in the IS-first view, we are only free to choose the unnormalized targets and the proposal for the next step, given previous steps. To be able to propose changes to the whole sequence like we would do in SMCS, we would have to make the full SMCS state be part of the per-step states of the IS-view. Moreover, we would need to design the reverse kernels implicitly via the unnormalized target densities.

The SMCS-first view is more natural for problems where \(z_t\)s are from arbitrary spaces like inference in probabilistic programs. The IS-first view also can also be used for inference in probabilistic programs (like in Anglican) but it strongly encourages/forces a decomposition of the program into a sequence of increasing spaces which might be sub-optimal.
The SMCS-first view makes it easier to think about and design SMC algorithms with MCMC moves:
- A natural pattern inspired by AIS is to have the forward proposal be an MCMC move, the corresponding reverse proposal be its time reversal, and have the weight be the ratio of the unnormalized densities.
- Del Moral et al., 2006; Section 3.3 discusses the optimality of forward and reverse kernels and why the AIS-style reverse kernels are not optimal
The SMCS-first view can be used to justify various extensions of the particle filter
- E.g. the “move” step in Resample-Move (Doucet and Johansen, 2008; Section 4.4) can be interpreted as an MCMC move after the resampling step that targets an unnormalized density which is identical to the unnormalized density from before resampling. The weights stay constant.
The IS-first view could be useful when thinking about designing proposals for filtering problems (Doucet and Johansen, 2008; Section 4.1).
The IS-first view lets SMCS inherit results from the IS-formulation such as the unbiasedness of the normalizing constant estimator (e.g. https://www.tuananhle.co.uk/notes/smc-evidence-unbiasedness.html) and the asymptotic variance of the normalizing constant estimator and the posterior mean estimators (e.g. (Doucet and Johansen, 2008; Section 3.6)). That’s how it’s done in the original SMCS paper. I’m sure it is possible to have a SMCS-first proof of these results.

[back]