Tuan Anh Le

A Better Proof of Unbiasedness of the Sequential Monte Carlo Based Normalizing Constant Estimator

22 June 2023

The standard proof of unbiasedness of the sequential Monte Carlo (SMC) normalizing constant estimator always seemed like magic to me. We introduce auxiliary variables, painstakingly push symbols around and voila, the normalizing constant estimator is unbiased.

In this note, we present a proof of the unbiasedness of the normalizing constant estimator based on proper weights. I think this proof is much better because

As explored in Two Views of SMC, the increasing spaces and the SMC sampler (SMCS) formulations of SMC are equivalent. Here, we focus on proving unbiasedness of the SMCS-based normalizing constant estimator which will automatically imply unbiasedness of the increasing spaces normalizing constant estimator (as is used in the standard proof).

We first introduce SMCS, then we introduce proper weights, and finally, we prove unbiasedness of the SMCS normalizing constant estimator using proper weights.

Sequential Monte Carlo Sampler

We have a sequence of unnormalized target densities \(\{\gamma_t(z_t)\}_{t = 1}^T\) with normalizing constants \(Z_t = \int \gamma_t(z_t) \,\mathrm dz_t\). We want to estimate the normalizing constants \(Z_t\) and expectations with respect to the (normalized) target distributions \(\pi_t(z_t) = \gamma_t(z_t) / Z_t\). To do this, we design a sequence of forward kernels \(\{q_t(z_t \given z_{t - 1})\}_{t = 1}^T\) and a sequence of reverse kernels \(\{r_{t - 1}(z_{t - 1} \given z_t\}_{t = 2}^T\).

Assuming a particle budget \(K\) and resampling at every step, the SMCS procedure is

  1. Sample \(z_1^k \sim q_1(z_1)\)
  2. Weigh \(w_1^k = \frac{\gamma_1(z_1^k)}{q_1(z_1^k)}\)
  3. For \(t = 2, \dotsc, T\)
    1. Sample ancestral indices \(a_{t - 1}^{1:K}\) based on weights \(w_{t - 1}^{1:K}\)
    2. Set \(z_{t - 1}^k \leftarrow z_{t - 1}^{a_{t - 1}^k}\)
    3. Propose \(z_t^k \sim q_t(z_t \given z_{t - 1}^k)\)
    4. Weigh \(w_t^k = \frac{\gamma_t(z_t^k) r_{t - 1}(z_{t - 1}^k \given z_t^k)}{\gamma_{t - 1}(z_{t - 1}^k) q_t(z_t^k \given z_{t - 1}^k)}\)

At every timestep, we can estimate the normalizing constant and expectations as \begin{align} Z_t &\approx \hat Z_t = \prod_{\tau = 1}^t \left(\frac{1}{K} \sum_{k = 1}^K w_\tau^k \right), \label{eq:Z} \\
\E_{\pi_t(z_t)}[f(z_t)] &\approx \hat I_t = \frac{1}{\sum_{k = 1}^K w_t^k} \sum_{k = 1}^K w_t^k f(z_t^k). \label{eq:I} \end{align}

Proper Weights

Given an unnormalized density \(\gamma(z)\), with a corresponding normalizing constant \(Z_\pi = \int \gamma(z) \,\mathrm dz\) and normalized density \(\pi(z) = \gamma(z) / Z_\pi\), random variables \(z, w\) are said to be properly weighted with respect to \(\gamma(z)\) if, for any function \(f\), \begin{align} \E[w f(z)] = Z_\pi \E_\pi[f(z)]. \label{eq:proper-weight} \end{align}

Given a properly weighted sample with respect to \(\gamma\), \((z, w)\), we can estimate \(Z_\pi\) by setting \(f(z) = 1\) \begin{align} \hat Z_\pi = \frac{1}{N} \sum_{n = 1}^N w_n. \label{eq:proper-weight-Z-est} \end{align} This estimator is unbiased because it is a Monte Carlo estimator of the left hand side of \eqref{eq:proper-weight} whose right hand side is \(Z_\pi\).

We can also estimate \(I = \E_\pi[f(z)]\) as \begin{align} \hat I = \frac{\frac{1}{N}\sum_{n = 1}^N w_n f(z_n)}{\frac{1}{N} \sum_{n = 1}^N w_n}. \label{eq:proper-weight-I-est} \end{align} Since the numerator and the denominator are unbiased Monte Carlo estimators of \(Z_\pi \E_\pi[f(z)]\) and \(Z_\pi\) respectively, their ratio is a consistent estimator of \(I\) and we can use the delta method to establish its asymptotic variance, like in importance sampling.

For the purposes of this note, we will focus on two operations that preserve the proper weighting property: resampling and the SMCS move.

Resampling preserves proper weighting. Given a set of \(K\) weighted particles \(\{(z_k, w_k)\}_{k = 1}^K\), multinomial resampling

  1. Samples \(a_k \sim \mathrm{Categorical}\left(\left(\frac{w_i}{\sum_{j = 1}^K w_j}\right)_{i = 1}^K\right)\) and
  2. Sets \(z_k' \leftarrow z_{a_k}\).

The resampling weight is defined as \begin{align} w_k’ = \frac{1}{K} \sum_{i = 1}^K w_i, \label{eq:post-resample-weight} \end{align} that is, a constant across all \(k = 1, \dotsc, K\).

If \((z_k, w_k)\) is properly weighted with respect to \(\gamma\), so is \((z_k', w_k')\).

Proof. To show that \begin{align} \E[w_k’ f(z_k’)] = Z \E_{\pi(z)}[f(z)] \end{align} where \(Z\) is \(\gamma\)’s normalizing constant and \(\pi(z) = \gamma(z) / Z\) is the corresponding normalized target density: \begin{align} \mathrm{LHS} &= \E\left[\sum_{a_{1:K} \in \{1,\dotsc,K\}^K} p(a_{1:K} \given w_{1:K}) w_k’ f(z_{a_k}) \right] & \text{(} p(a_{1:K} \given w_{1:K}) \text{ is the probability of resampling indices)} \\
&= \E\left[w_k’ \sum_{a_{1:K} \in \{1,\dotsc,K\}^K} p(a_{1:K} \given w_{1:K}) f(z_{a_k}) \right] &\text{(} w_k’ \text{ is independent of } a_{1:K} \text{)}\\
&= \E\left[w_k’ \sum_{a_{1:K} \in \{1,\dotsc,K\}^K} \left(\prod_{i = 1}^K p(a_i \given w_{1:K})\right) f(z_{a_k}) \right] &\text{(indices are sampled independently)}\\
&= \E\left[w_k’ \sum_{a_k = 1}^K p(a_k \given w_{1:K}) f(z_{a_k}) \right] &\text{(summing over } a_i \text{s where } i \neq k \text{ equals one)}\\
&= \E\left[w_k’ \sum_{a_k = 1}^K \frac{w_{a_k}}{\sum_{j = 1}^K w_j} f(z_{a_k}) \right] \\
&= \E\left[w_k’ \sum_{i = 1}^K \frac{w_{i}}{\sum_{j = 1}^K w_j} f(z_{i}) \right] \\
&= \E\left[\left(\frac{1}{K} \sum_{i = 1}^K w_i\right) \frac{1}{\sum_{j = 1}^K w_j} \sum_{i = 1}^K w_i f(z_{i}) \right] & \text{(sub in \eqref{eq:post-resample-weight})}\\
&= \E\left[\frac{1}{K} \sum_{i = 1}^K w_i f(z_{i}) \right] \\
&= \frac{1}{K} \sum_{i = 1}^K \E\left[ w_i f(z_{i}) \right] \\
&= \E\left[ w_i f(z_{i}) \right] \\
&= \mathrm{RHS} & \text{(since } (z_i, w_i) \text{ is properly weighted with respect to } \gamma \text{)}. \end{align}

SMCS move preserves proper weighting. The SMCS move operation assumes we have a properly weighted sample \((z, w)\) with respect to \(\gamma\), an unnormalized target which we want to move to \(\gamma'\), and forward and reverse kernels \(q(z' | z)\) and \(r(z | z')\). Given \(z\) the SMCS move

  1. Samples \(z' \sim q(z' \given z)\) and
  2. Weighs \(w' = \frac{\gamma'(z') r(z \given z')}{\gamma(z) q(z' \given z)} w\).

The sample \((z', w')\) is properly weighted with respect to \(\gamma'\).

Proof. To show that \begin{align} \E[w’ f(z’)] = Z’ \E_{\pi’(z’)}[f(z’)] \end{align} where \(Z'\) is \(\gamma\)’s normalizing constant and \(\pi'(z') = \gamma'(z') / Z'\) is the corresponding normalized target density: \begin{align} \mathrm{LHS} &= \E\left[\frac{\gamma’(z’) r(z \given z’)}{\gamma(z) q(z’ \given z)} w f(z’) \right] &\text{(sub in for } w’ \text{)} \\
&= \E\left[\int q(z’ \given z) \frac{\gamma’(z’) r(z \given z’)}{\gamma(z) q(z’ \given z)} w f(z’) \,\mathrm dz’\right] &\text{(make expectation under } z’ \text{ explicit)} \\
&= \E\left[\int \frac{\gamma’(z’) r(z \given z’)}{\gamma(z)} w f(z’) \,\mathrm dz’\right] \\
&= \E\left[w \int \frac{\gamma’(z’) r(z \given z’)}{\gamma(z)} f(z’) \,\mathrm dz’\right] &\text{(} w \text{ doesn’t depend on } z’ \text{)} \\
&= \E\left[w g(z)\right] &\text{(treat the integral as a function of } z \text{, } g(z) \text{)} \\
&= Z_\pi \E_{\pi(z)}[g(z)] &\text{(apply the proper weighting property of } (z, w) \text{)} \\
&= Z_\pi \E_{\pi(z)}\left[\int \frac{\gamma’(z’) r(z \given z’)}{\gamma(z)} f(z’) \,\mathrm dz’\right] &\text{(sub the integral back in)} \\
&= \int Z_\pi \pi(z) \int \frac{\gamma’(z’) r(z \given z’)}{\gamma(z)} f(z’) \,\mathrm dz’ \,\mathrm dz &\text{(write the expectation as an integral)} \\
&= \int \int \gamma’(z’) r(z \given z’) f(z’) \,\mathrm dz’ \,\mathrm dz &\text{(} \gamma(z) = Z_\pi \pi(z) \text{)} \\
&= \int \gamma’(z’) f(z’) \,\mathrm dz’ &\text{(} r(z \given z’) \text{ is normalized)} \\
&= \mathrm{RHS}. \end{align}

The Proof

To prove unbiasedness of \eqref{eq:Z}, we establish proper weighting invariants in the SMCS procedure above:

Since at the end of every for loop body, \((z_t^k, \hat Z_{t - 1} w_t^k)\) is properly weighted with respect to \(\gamma_t\), following \eqref{eq:proper-weight-Z-est}, we can estimate \(Z_t\) unbiasedly as \(\hat Z_t\) in \eqref{eq:Z}: \begin{align} \frac{1}{K} \sum_{k = 1}^K \hat Z_{t - 1} w_t^k = \hat Z_{t - 1} \frac{1}{K} \sum_{k = 1}^K w_t^k = \hat Z_t. \end{align}

Following \eqref{eq:proper-weight-I-est}, we can estimate \(\E_{\pi_t(z_t)}[f(z_t)]\) as \(\hat I_t\) in \eqref{eq:I} \begin{align} \frac{\frac{1}{K} \sum_{k = 1}^K \hat Z_{t - 1} w_t^k f(z_t^k)}{\frac{1}{K} \sum_{k = 1}^K \hat Z_{t - 1} w_t^k} = \frac{\frac{1}{K} \sum_{k = 1}^K w_t^k f(z_t^k)}{\frac{1}{K} \sum_{k = 1}^K w_t^k} = \hat I_t. \end{align}