A Better Proof of Unbiasedness of the Sequential Monte Carlo Based Normalizing Constant Estimator

22 June 2023

This note is a result of discussions with Feras Saad in 2022 when he was a visiting researcher in Google.

The standard proof of unbiasedness of the sequential Monte Carlo (SMC) normalizing constant estimator always seemed like magic to me. We introduce auxiliary variables, painstakingly push symbols around and voila, the normalizing constant estimator is unbiased.

In this note, we present a proof of the unbiasedness of the normalizing constant estimator based on proper weights. I think this proof is much better because

it matches more with our intuitions about why SMC works,
it can be used for proving unbiasedness of different variants of SMC, e.g. with optional resampling,
the proof technique can be used for particle-based methods, like annealed importance sampling, and more.

As explored in Two Views of SMC, the increasing spaces and the SMC sampler (SMCS) formulations of SMC are equivalent. Here, we focus on proving unbiasedness of the SMCS-based normalizing constant estimator which will automatically imply unbiasedness of the increasing spaces normalizing constant estimator (as is used in the standard proof).

We first introduce SMCS, then we introduce proper weights, and finally, we prove unbiasedness of the SMCS normalizing constant estimator using proper weights.

Sequential Monte Carlo Sampler

We have a sequence of unnormalized target densities \(\{\gamma_t(z_t)\}_{t = 1}^T\) with normalizing constants \(Z_t = \int \gamma_t(z_t) \,\mathrm dz_t\). We want to estimate the normalizing constants \(Z_t\) and expectations with respect to the (normalized) target distributions \(\pi_t(z_t) = \gamma_t(z_t) / Z_t\). To do this, we design a sequence of forward kernels \(\{q_t(z_t \given z_{t - 1})\}_{t = 1}^T\) and a sequence of reverse kernels \(\{r_{t - 1}(z_{t - 1} \given z_t\}_{t = 2}^T\).

Assuming a particle budget \(K\) and resampling at every step, the SMCS procedure is

Sample \(z_1^k \sim q_1(z_1)\)
Weigh \(w_1^k = \frac{\gamma_1(z_1^k)}{q_1(z_1^k)}\)
For \(t = 2, \dotsc, T\)
1. Sample ancestral indices \(a_{t - 1}^{1:K}\) based on weights \(w_{t - 1}^{1:K}\)
2. Set \(z_{t - 1}^k \leftarrow z_{t - 1}^{a_{t - 1}^k}\)
3. Propose \(z_t^k \sim q_t(z_t \given z_{t - 1}^k)\)
4. Weigh \(w_t^k = \frac{\gamma_t(z_t^k) r_{t - 1}(z_{t - 1}^k \given z_t^k)}{\gamma_{t - 1}(z_{t - 1}^k) q_t(z_t^k \given z_{t - 1}^k)}\)

At every timestep, we can estimate the normalizing constant and expectations as \begin{align} Z_t &\approx \hat Z_t = \prod_{\tau = 1}^t \left(\frac{1}{K} \sum_{k = 1}^K w_\tau^k \right), \label{eq:Z} \\
\E_{\pi_t(z_t)}[f(z_t)] &\approx \hat I_t = \frac{1}{\sum_{k = 1}^K w_t^k} \sum_{k = 1}^K w_t^k f(z_t^k). \label{eq:I} \end{align}

Proper Weights

Given an unnormalized density \(\gamma(z)\), with a corresponding normalizing constant \(Z_\pi = \int \gamma(z) \,\mathrm dz\) and normalized density \(\pi(z) = \gamma(z) / Z_\pi\), random variables \(z, w\) are said to be properly weighted with respect to \(\gamma(z)\) if, for any function \(f\), \begin{align} \E[w f(z)] = Z_\pi \E_\pi[f(z)]. \label{eq:proper-weight} \end{align}

Given a properly weighted sample with respect to \(\gamma\), \((z, w)\), we can estimate \(Z_\pi\) by setting \(f(z) = 1\) \begin{align} \hat Z_\pi = \frac{1}{N} \sum_{n = 1}^N w_n. \label{eq:proper-weight-Z-est} \end{align} This estimator is unbiased because it is a Monte Carlo estimator of the left hand side of \eqref{eq:proper-weight} whose right hand side is \(Z_\pi\).

We can also estimate \(I = \E_\pi[f(z)]\) as \begin{align} \hat I = \frac{\frac{1}{N}\sum_{n = 1}^N w_n f(z_n)}{\frac{1}{N} \sum_{n = 1}^N w_n}. \label{eq:proper-weight-I-est} \end{align} Since the numerator and the denominator are unbiased Monte Carlo estimators of \(Z_\pi \E_\pi[f(z)]\) and \(Z_\pi\) respectively, their ratio is a consistent estimator of \(I\) and we can use the delta method to establish its asymptotic variance, like in importance sampling.

For the purposes of this note, we will focus on two operations that preserve the proper weighting property: resampling and the SMCS move.

Resampling preserves proper weighting. Given a set of \(K\) weighted particles \(\{(z_k, w_k)\}_{k = 1}^K\), multinomial resampling

Samples \(a_k \sim \mathrm{Categorical}\left(\left(\frac{w_i}{\sum_{j = 1}^K w_j}\right)_{i = 1}^K\right)\) and
Sets \(z_k' \leftarrow z_{a_k}\).

The resampling weight is defined as \begin{align} w_k’ = \frac{1}{K} \sum_{i = 1}^K w_i, \label{eq:post-resample-weight} \end{align} that is, a constant across all \(k = 1, \dotsc, K\).

If \((z_k, w_k)\) is properly weighted with respect to \(\gamma\), so is \((z_k', w_k')\).

Proof. To show that \begin{align} \E[w_k’ f(z_k’)] = Z \E_{\pi(z)}[f(z)] \end{align} where \(Z\) is \(\gamma\)’s normalizing constant and \(\pi(z) = \gamma(z) / Z\) is the corresponding normalized target density: \begin{align} \mathrm{LHS} &= \E\left[\sum_{a_{1:K} \in \{1,\dotsc,K\}^K} p(a_{1:K} \given w_{1:K}) w_k’ f(z_{a_k}) \right] & \text{(} p(a_{1:K} \given w_{1:K}) \text{ is the probability of resampling indices)} \\
&= \E\left[w_k’ \sum_{a_{1:K} \in \{1,\dotsc,K\}^K} p(a_{1:K} \given w_{1:K}) f(z_{a_k}) \right] &\text{(} w_k’ \text{ is independent of } a_{1:K} \text{)}\\
&= \E\left[w_k’ \sum_{a_{1:K} \in \{1,\dotsc,K\}^K} \left(\prod_{i = 1}^K p(a_i \given w_{1:K})\right) f(z_{a_k}) \right] &\text{(indices are sampled independently)}\\
&= \E\left[w_k’ \sum_{a_k = 1}^K p(a_k \given w_{1:K}) f(z_{a_k}) \right] &\text{(summing over } a_i \text{s where } i \neq k \text{ equals one)}\\
&= \E\left[w_k’ \sum_{a_k = 1}^K \frac{w_{a_k}}{\sum_{j = 1}^K w_j} f(z_{a_k}) \right] \\
&= \E\left[w_k’ \sum_{i = 1}^K \frac{w_{i}}{\sum_{j = 1}^K w_j} f(z_{i}) \right] \\
&= \E\left[\left(\frac{1}{K} \sum_{i = 1}^K w_i\right) \frac{1}{\sum_{j = 1}^K w_j} \sum_{i = 1}^K w_i f(z_{i}) \right] & \text{(sub in \eqref{eq:post-resample-weight})}\\
&= \E\left[\frac{1}{K} \sum_{i = 1}^K w_i f(z_{i}) \right] \\
&= \frac{1}{K} \sum_{i = 1}^K \E\left[ w_i f(z_{i}) \right] \\
&= \E\left[ w_i f(z_{i}) \right] \\
&= \mathrm{RHS} & \text{(since } (z_i, w_i) \text{ is properly weighted with respect to } \gamma \text{)}. \end{align}

SMCS move preserves proper weighting. The SMCS move operation assumes we have a properly weighted sample \((z, w)\) with respect to \(\gamma\), an unnormalized target which we want to move to \(\gamma'\), and forward and reverse kernels \(q(z' | z)\) and \(r(z | z')\). Given \(z\) the SMCS move

Samples \(z' \sim q(z' \given z)\) and
Weighs \(w' = \frac{\gamma'(z') r(z \given z')}{\gamma(z) q(z' \given z)} w\).

The sample \((z', w')\) is properly weighted with respect to \(\gamma'\).

Proof. To show that \begin{align} \E[w’ f(z’)] = Z’ \E_{\pi’(z’)}[f(z’)] \end{align} where \(Z'\) is \(\gamma\)’s normalizing constant and \(\pi'(z') = \gamma'(z') / Z'\) is the corresponding normalized target density: \begin{align} \mathrm{LHS} &= \E\left[\frac{\gamma’(z’) r(z \given z’)}{\gamma(z) q(z’ \given z)} w f(z’) \right] &\text{(sub in for } w’ \text{)} \\
&= \E\left[\int q(z’ \given z) \frac{\gamma’(z’) r(z \given z’)}{\gamma(z) q(z’ \given z)} w f(z’) \,\mathrm dz’\right] &\text{(make expectation under } z’ \text{ explicit)} \\
&= \E\left[\int \frac{\gamma’(z’) r(z \given z’)}{\gamma(z)} w f(z’) \,\mathrm dz’\right] \\
&= \E\left[w \int \frac{\gamma’(z’) r(z \given z’)}{\gamma(z)} f(z’) \,\mathrm dz’\right] &\text{(} w \text{ doesn’t depend on } z’ \text{)} \\
&= \E\left[w g(z)\right] &\text{(treat the integral as a function of } z \text{, } g(z) \text{)} \\
&= Z_\pi \E_{\pi(z)}[g(z)] &\text{(apply the proper weighting property of } (z, w) \text{)} \\
&= Z_\pi \E_{\pi(z)}\left[\int \frac{\gamma’(z’) r(z \given z’)}{\gamma(z)} f(z’) \,\mathrm dz’\right] &\text{(sub the integral back in)} \\
&= \int Z_\pi \pi(z) \int \frac{\gamma’(z’) r(z \given z’)}{\gamma(z)} f(z’) \,\mathrm dz’ \,\mathrm dz &\text{(write the expectation as an integral)} \\
&= \int \int \gamma’(z’) r(z \given z’) f(z’) \,\mathrm dz’ \,\mathrm dz &\text{(} \gamma(z) = Z_\pi \pi(z) \text{)} \\
&= \int \gamma’(z’) f(z’) \,\mathrm dz’ &\text{(} r(z \given z’) \text{ is normalized)} \\
&= \mathrm{RHS}. \end{align}

The Proof

To prove unbiasedness of \eqref{eq:Z}, we establish proper weighting invariants in the SMCS procedure above:

Just before the for loop (after line 2), \((z_1^k, w_1^k)\) is properly weighted with respect to \(\gamma_1\).

This is because \begin{align} \E[w_1^k f(z_1^k)] = \int q_1(z_1^k) \frac{\gamma_1(z_1^k)}{q_1(z_1^k)} f(z_1^k) \,\mathrm d\,z_1^k = Z_1 \E_{\pi(z_1^k)}[f(z_1^k)]. \end{align}
At the start of the for loop body (before line 3.1), \((z_{t - 1}^k, \hat Z_{t - 2} w_{t - 1}^k)\) is properly weighted with respect to \(\gamma_{t - 1}\), where we define \(\hat Z_0 = 1\) and \(\hat Z_{1:T}\) as in \eqref{eq:Z}.

At the first iteration where \(t = 2\), this is true since this is just saying that \((z_1^k, w_1^k)\) is properly weighted with respect to \(\gamma_1\).
After resampling (before line 3.3), \((z_{t - 1}^k, \hat Z_{t - 1})\) is properly weighted with respect to \(\gamma_{t - 1}\).

Without any change to the procedure, assume that ancestral indices in line 3.1 are sampled based on weights \((\hat Z_{t - 2} w_{t - 1}^k)_{k = 1}^K\) instead of \(w_{t - 1}^{1:K}\). This is valid since \(\hat Z_{t - 2}\) is a constant with respect to \(k\) and resampling normalizes the weights anyway.

From the invariant before resampling, \((z_{t - 1}^k, \hat Z_{t - 2} w_{t - 1}^k)\) (with \(z_{t - 1}^k\) taking on the value before being overriden in line 3.2) is properly weighted with respect to \(\gamma_{t - 1}\).

Since resampling preserves proper weighting, \((z_{t - 1}^k, w_k')\) (with \(z_{t - 1}^k\) taking on the value after being overriden in line 3.2) is also properly weighted with respect to \(\gamma_{t - 1}\) with the post-resample weight \(w_k'\) being the mean of the pre-resample weights as in \eqref{eq:post-resample-weight} \begin{align} w_k’ &= \frac{1}{K}\sum_{i = 1}^K \hat Z_{t - 2} w_{t - 1}^i \\
&= \hat Z_{t - 2} \left(\frac{1}{K} \sum_{i = 1}^K w_{t - 1}^i\right) \\
&= \hat Z_{t - 1}. \end{align}
After the move operation (after line 3.4), \((z_t^k, \hat Z_{t - 1} w_t^k)\) is properly weighted with respect to \(\gamma_t\).

This follows directly from the fact that the SMCS move preserves proper weighting. At the next iteration of the for loop, the invariant at the start of the for loop body is satisfied.

Since at the end of every for loop body, \((z_t^k, \hat Z_{t - 1} w_t^k)\) is properly weighted with respect to \(\gamma_t\), following \eqref{eq:proper-weight-Z-est}, we can estimate \(Z_t\) unbiasedly as \(\hat Z_t\) in \eqref{eq:Z}: \begin{align} \frac{1}{K} \sum_{k = 1}^K \hat Z_{t - 1} w_t^k = \hat Z_{t - 1} \frac{1}{K} \sum_{k = 1}^K w_t^k = \hat Z_t. \end{align}

Following \eqref{eq:proper-weight-I-est}, we can estimate \(\E_{\pi_t(z_t)}[f(z_t)]\) as \(\hat I_t\) in \eqref{eq:I} \begin{align} \frac{\frac{1}{K} \sum_{k = 1}^K \hat Z_{t - 1} w_t^k f(z_t^k)}{\frac{1}{K} \sum_{k = 1}^K \hat Z_{t - 1} w_t^k} = \frac{\frac{1}{K} \sum_{k = 1}^K w_t^k f(z_t^k)}{\frac{1}{K} \sum_{k = 1}^K w_t^k} = \hat I_t. \end{align}

[back]

Tuan Anh Le

A Better Proof of Unbiasedness of the Sequential Monte Carlo Based Normalizing Constant Estimator

Sequential Monte Carlo Sampler

Proper Weights

The Proof