Tuan Anh Le

Reversible jump Markov chain Monte Carlo

11 November 2022

Reversible jump Markov chain Monte Carlo (RJMCMC) (Green, 1995) is a fascinating algorithm that, in principle, lets us sample from a target distribution over spaces with varying dimensionality. The core ideas in RJMCMC can be extended to sample from the space of traces of universal probabilistic programs (Cusumano-Towner et al., 2020) which can be useful if we want to infer structures. For example, we can infer compositional Gaussian process kernels (Duvenaud et al., 2013; Saad et al., 2019) which is also known as The Automatic Statistician.

The purpose of this note is to convince myself that the algorithm is correct—that is, it converges—without discussing the speed of convergence. In particular, why is there a Jacobian and move probabilities in a more-or-less Metropolis-Hastings-like acceptance ratio? In trying to understand RJMCMC, I’ve drawn on the following resources.

1 The algorithm

1.1 The target distribution

Let’s say we want to sample from a target distribution with an unnormalized density \(\pi(x)\) where \(x = (k, \theta_k)\) comprises a model indicator \(k \in \mathcal K \subseteq \mathbb N\) and a \(n_k\)-dimensional model parameter \(\theta_k \in \mathcal X_k \subseteq R^{n_k}\). As a result, the space of \(x\)’s, \(\mathcal X = \cup_{k \in \mathcal K} \{k\} \times \mathcal X_k\), is transdimensional.

1.2 RJMCMC ingredients

RJMCMC requires defining a move type \(m = (\{(k_1, g_{m, k_1 \to k_2}, h_{m, k_1 \to k_2}), (k_2, g_{m, k_2 \to k_1}, h_{m, k_2 \to k_1})\}, d_m) \in \mathcal M\).

We also define a probability of choosing a move \(j(m \given x)\) (“j” for “jump”) which is zero if the move is invalid: given \(x = (k, \theta_{k})\) and \(m\) is a move between \(k_1\) and \(k_2\), \begin{align} j(m \given x) \begin{cases} \geq 0 & \text{if } k \in {k_1, k_2} \\
= 0 & \text{otherwise.} \end{cases} \end{align} Further, note that there can be more than one move between \(k_1\) and \(k_2\).

1.3 The transition process \(T(x' \given x)\)

Given that we are in state \(x = (k, \theta_k)\), the RJMCMC transition process \(T(x' \given x)\) for going to \(x' = (k', \theta'_{k'})\) is

The acceptance probability is a Metropolis-Hastings-like term \begin{align} \alpha_{m, k}(\theta_k, u) = \min\left(1, \frac{\pi(\bar x)j(m \given \bar x) \bar g_m(\bar u)}{\pi(x) j(m \given x) g_m(u)} \left| \frac{\partial (\bar \theta_{\bar k}, \bar u)}{\partial (\theta_k, u)} \right|\right), \label{acceptance_prob} \end{align} where the \(|\cdot|\) term is the determinant of a Jacobian of \(h_m\) evaluated at \((\theta_k, u)\).

1.4 RJMCMC

The full algorithm then

to yield a Markov chain \((x_0, x_1, x_2, \dotsc)\) that can be used to approximate \(\pi(x)\).

2 The transition process is stationary

All I want to convince myself of is that the transition process \(T(x' \given x)\) described above is stationary: \begin{align} \int T(x’ \given x) \pi(x) \,\mathrm dx = \pi(x’). \label{stationarity} \end{align} If this is true, then, along with some other conditions, we can essentially use \((x_0, x_1, x_2, \dotsc)\) to approximate \(\pi(x)\).

For stationarity \eqref{stationarity}, it is sufficient to show that \(T(x' \given x)\) satisfies detailed balance: \begin{align} T(x’ \given x)\pi(x) = T(x \given x’)\pi(x’). \label{db} \end{align} To see this, we can substitute \eqref{db} to the LHS of \eqref{stationarity} to get the RHS.

For RJMCMC, we actually need to use the integrated detailed balance condition: \begin{align} \forall A, B \subseteq \mathcal X, \quad \iint_{x \in A \\\ x’ \in B} T(x’ \given x)\pi(x) \,\mathrm dx \,\mathrm dx’ = \iint_{x \in A \\\ x’ \in B} T(x \given x’)\pi(x’) \,\mathrm dx \,\mathrm dx’ . \label{integrated_db} \end{align} (Actually \(A, B\) need to be Borel subsets of \(\mathcal X\) but this doesn’t materially change the argument). The integrated detailed balance condition \eqref{integrated_db} implies the detailed balance condition \eqref{db}. I can’t formally prove it but I’m willing to believe it. Given distributions \(p(z)\) and \(q(z)\), it’s intuitive that if \(\int_A p(z) \,\mathrm dz = \int_A q(z) \,\mathrm dz\) for all \(A\) then \(p(z) = q(z)\).

2.1 Deriving the acceptance probability \eqref{acceptance_prob}

It turns out that for the integrated detailed balance condition to be true, it is sufficient that for all \(m = (\{(k, g_m, h_m), (\bar k, \bar g_m, \bar h_m)\}, d_m) \in \mathcal M\) and \(C \subseteq \mathcal X_k, D \subseteq \mathcal X_{\bar k}\) (again, \(C, D\) are Borel subsets) \begin{align} \iint_{\theta_k \in C \\\ \bar \theta_{\bar k} \in D} \pi(x) j(m \given x) g_m(u) \alpha_{m, k}(\theta_k, u) \,\mathrm du \mathrm d\theta_k = \iint_{\theta_k \in C \\\ \bar \theta_{\bar k} \in D} \pi(\bar x) j(m \given \bar x) \bar g_m(\bar u) \alpha_{m, \bar k}(\bar \theta_{\bar k}, \bar u) \,\mathrm d\bar u \mathrm d\bar \theta_{\bar k} . \label{rjmcmc_integrated_db} \end{align} Call this the integrated detailed balance for RJMCMC. We will verify that \eqref{rjmcmc_integrated_db} implies integrated detailed balance \eqref{integrated_db} later. For now, let’s assume it’s true and see whether we can convince ourselves that the acceptance probability in \eqref{acceptance_prob} makes \eqref{rjmcmc_integrated_db} hold. This is where we get the Jacobian term from, which I think is the most interesting part of the algorithm.

We are going to apply the change of variable rule for integration to the RHS of \eqref{rjmcmc_integrated_db} to get an integral over \((u, \theta_k)\) like in the LHS. Then, we’re going to choose the acceptance probability \(\alpha_{m, k}(\theta_k, u)\) so that the integrands on both sides are the same which will yield the expression in \eqref{acceptance_prob}.

Applying the change of variable rule

Since the integration of the LHS \eqref{rjmcmc_integrated_db} is over \((\theta_k, u)\), the set we’re integrating over is actually \begin{align} L = \{ (\theta_k, u): \theta_k \in C, \bar \theta_{\bar k} = h_m(\theta_k, u)_1 \in D \}. \end{align}

Similarly the set of \((\bar \theta_{\bar k}, \bar u)\) we’re integrating over in the RHS is actually \begin{align} R = \{ (\bar \theta_{\bar k}, \bar u): \theta_k = \bar h_m(\bar \theta_{\bar k}, \bar u)_1 \in C, \bar \theta_{\bar k} \in D \}. \end{align}

First, note that we can rewrite \(R = h_m(L)\) \begin{align} h_m(L) &= \{ h_m(\theta_k, u): \theta_k \in C, h_m(\theta_k, u)_1 \in D \} \\
&= \{ (\bar \theta_{\bar k}, \bar u): h_m^{-1}(\bar \theta_{\bar k}, \bar u)_1 \in C, h_m(h_m^{-1}(\bar \theta_{\bar k}, \bar u))_1 \in D \} \\
&= \{ (\bar \theta_{\bar k}, \bar u): \bar h_m(\bar \theta_{\bar k}, \bar u)_1 \in C, \bar \theta_{\bar k} \in D \} \\
&= R. \end{align}

Calling the integrand in the RHS of \eqref{rjmcmc_integrated_db} \(f(\bar \theta_{\bar k}, \bar u)\), we can rewrite the RHS of \eqref{rjmcmc_integrated_db} using the change of variable rule as \begin{align} \text{RHS of \eqref{rjmcmc_integrated_db}} &= \iint_{h_m(L)} f(\bar \theta_{\bar k}, \bar u) \,\mathrm d\bar u \mathrm d\bar \theta_{\bar k} \\
&= \iint_{L} f(h_m(\theta_k, u)) \left| \frac{\partial (\bar \theta_{\bar k}, \bar u)}{\partial (\theta_k, u)} \right| \,\mathrm d u \mathrm d \theta_k \\
&= \iint_{L} \pi(\bar x) j(m \given \bar x) \bar g_m(\bar u) \alpha_{m, \bar k}(\bar \theta_{\bar k}, \bar u) \left| \frac{\partial (\bar \theta_{\bar k}, \bar u)}{\partial (\theta_k, u)} \right| \,\mathrm d u \mathrm d \theta_k. \end{align} This is where the Jacobian comes into play!

Choosing the acceptance probability

Now, both LHS and RHS of \eqref{rjmcmc_integrated_db} are integrals of \((u, \theta_k)\) over the same set, \(L\). Therefore, to make \eqref{rjmcmc_integrated_db} hold, it is sufficient to choose the acceptance probability \(\alpha_{m, k}(\theta_k, u)\) to make the integrands the same: \begin{align} \pi(x) j(m \given x) g_m(u) \alpha_{m, k}(\theta_k, u) = \pi(\bar x) j(m \given \bar x) \bar g_m(\bar u) \alpha_{m, \bar k}(\bar \theta_{\bar k}, \bar u) \left| \frac{\partial (\bar \theta_{\bar k}, \bar u)}{\partial (\theta_k, u)} \right|. \label{final_condition} \end{align}

If we substitute \(\alpha_{m, k}(\theta_k, u)\) from \eqref{acceptance_prob} and the equivalent term for going in the reverse direction \begin{align} \alpha_{m, \bar k}(\bar \theta_{\bar k}, \bar u) = \min\left(1, \frac{\pi(x)j(m \given x) g_m(u)}{\pi(\bar x) j(m \given \bar x) \bar g_m(\bar u)} \left| \frac{\partial (\theta_k, u)}{\partial (\bar \theta_{\bar k}, \bar u)} \right|\right), \label{acceptance_prob_reverse} \end{align} into \eqref{final_condition}, we get \begin{align} \text{LHS of \eqref{final_condition}} &= \pi(x) j(m \given x) g_m(u) \min\left(1, \frac{\pi(\bar x)j(m \given \bar x) \bar g_m(\bar u)}{\pi(x) j(m \given x) g_m(u)} \left| \frac{\partial (\bar \theta_{\bar k}, \bar u)}{\partial (\theta_k, u)} \right|\right) \\
&= \min\left(\pi(x) j(m \given x) g_m(u), \pi(\bar x)j(m \given \bar x) \bar g_m(\bar u) \left| \frac{\partial (\bar \theta_{\bar k}, \bar u)}{\partial (\theta_k, u)} \right|\right) \\
&= \min\left(\pi(\bar x) j(m \given \bar x) \bar g_m(\bar u) \left| \frac{\partial (\bar \theta_{\bar k}, \bar u)}{\partial (\theta_k, u)} \right|, \pi(x)j(m \given x) g_m(u) \right) \\
&= \min\left(\pi(\bar x) j(m \given \bar x) \bar g_m(\bar u) \left| \frac{\partial (\bar \theta_{\bar k}, \bar u)}{\partial (\theta_k, u)} \right|, \pi(x)j(m \given x) g_m(u) \cdot \left| \frac{\partial (\theta_k, u)}{\partial (\bar \theta_{\bar k}, \bar u)} \right| \cdot \left| \frac{\partial (\bar \theta_{\bar k}, \bar u)}{\partial (\theta_k, u)} \right|\right) \\
&= \pi(\bar x) j(m \given \bar x) \bar g_m(\bar u) \min\left(1, \frac{\pi(x)j(m \given x) g_m(u)}{\pi(\bar x) j(m \given \bar x) \bar g_m(\bar u)} \left| \frac{\partial (\theta_k, u)}{\partial (\bar \theta_{\bar k}, \bar u)} \right|\right) \left| \frac{\partial (\bar \theta_{\bar k}, \bar u)}{\partial (\theta_k, u)} \right| \\
&= \text{RHS of \eqref{final_condition}}. \end{align}

Therefore, the acceptance probability in \eqref{acceptance_prob} makes the RJMCMC transition process stationary.

2.2 Verifying that integrated detailed balance for RJMCMC \eqref{rjmcmc_integrated_db} implies integrated detailed balance \eqref{integrated_db}

Our reasoning above relies on the fact that the integrated detailed balance for RJMCMC \eqref{rjmcmc_integrated_db} implies integrated detailed balance \eqref{integrated_db}. To show this, we’re going to start with the LHS of \eqref{integrated_db} \begin{align} P(A, B) = \iint_{x \in A \\\ x’ \in B} T(x’ \given x)\pi(x) \,\mathrm dx \,\mathrm dx’ \end{align} and evaluate the integrals as much as possible. This expression represents the probability of \(x \in A\) and \(x' \in B\) if \(x \sim \pi(x)\) and \(x' \sim T(x' \given x)\). The RHS of \eqref{integrated_db} is then just the same expression as the LHS, with \(A\) and \(B\) swapped, \(P(B, A)\). Then, we can verify that \eqref{rjmcmc_integrated_db} implies \eqref{integrated_db}.

Transition from \(x\) to \(x'\)

The transition process \(T(x' \given x)\) can be obtained by marginalizing out all the internal randomness comprising \(m\) (this is where the jump probabilities in the acceptance ratio come into play!), \(u\) and the implicit acceptance indicator: \begin{align} T(x’ \given x) &= \sum_{m \in \mathcal M_k} j(m \given x) \int_{u \in \mathcal U_k} g_m(u) \left[\alpha_{m, k}(\theta_k, u) \delta_{\bar x}(x’) + (1 - \alpha_{m, k}(\theta_k, u)) \delta_x(x’)\right] \,\mathrm du. \label{full_transition} \end{align} Here, we have assumed that \(x = (k, \theta_k)\) and \(\mathcal M_k\) is a set of move types each of which lets us move from \(k\). That is, each \(m = (\{(k, g_m, h_m), (\bar k, \bar g_m, \bar h_m)\}, d_m) \in \mathcal M_k\) is a move between \(k\) and \(\bar k\) for some \(\bar k\). Hence \(\bar x\) here refers to \((\bar k, \bar \theta_{\bar k})\) where \(\bar \theta_{\bar k}, \bar u = h_m(\theta_k, u)\). If \(m\) doesn’t let us move from \(k\), \(j(m \given x) = 0\) and the summand vanishes.

Transition from \(x\) to \(B\)

Let’s evaluate the integral over \(x'\) \begin{align} T(B \given x) := \int_{x’ \in B} T(x’ \given x) \,\mathrm dx’ \end{align} which represents the probability that \(x' \in B\) if we simulate \(x' \sim T(x' \given x)\) from \(x\). The only terms in \eqref{full_transition} which have \(x'\) are the deltas inside the square brackets so we can “push the integral” all the way in. The integral with the first delta evaluates to \begin{align} \int_{x’ \in B} \delta_{\bar x}(x’) \,\mathrm dx’ = \int_{x’ \in \mathcal X} \mathbb 1(x’ \in B) \delta_{\bar x}(x’) \,\mathrm dx’ = \mathbb 1(\bar x \in B) \end{align} and similarly, the integral with the second delta evaluates to \(\mathbb 1(x \in B)\). Since \(x = (k, \theta_k)\) and \(\bar x = (\bar k, \bar \theta_{\bar k})\), we can rewrite \(\mathbb 1(x \in B) = \mathbb 1(\theta_k \in B_k)\) and \(\mathbb 1(\bar x \in B) = \mathbb 1(\bar \theta_{\bar k} \in B_{\bar k})\) where \(B_k := \{\theta: (k, \theta) \in B\} \subseteq \mathcal X_k\) and similarly for \(B_{\bar k}\). Therefore \begin{align} T(B \given x) &= \sum_{m \in \mathcal M_k} j(m \given x) \left[ \int_{u \in \mathcal U_k} g_m(u) \alpha_{m, k}(\theta_k, u) \mathbb 1(\bar \theta_{\bar k} \in B_{\bar k}) \,\mathrm du + \mathbb 1(\theta_k \in B_k) \int_{u \in \mathcal U_k} g_m(u) (1 - \alpha_{m, k}(\theta_k, u)) \,\mathrm du \right] \\
&= \sum_{m \in \mathcal M_k} j(m \given x) \left[ \int_{\bar \theta_{\bar k} \in B_{\bar k}} g_m(u) \alpha_{m, k}(\theta_k, u) \,\mathrm du + \mathbb 1(\theta_k \in B_k) \int_{u \in \mathcal U_k} g_m(u) (1 - \alpha_{m, k}(\theta_k, u)) \,\mathrm du \right], \label{transition_to_B} \end{align} where the set we’re integrating over in the first integral is really \(\{u: h_m(\theta_k, u)_1 \in B_{\bar k}\}\) for which \(\bar \theta_{\bar k} \in B_{\bar k}\) is only a shorthand.

Intuitively, this is a mixture over jumps, and the first term in the square brackets represents the probability of proposing to move to \(B\) and accepting while the second term represents the probability of already being in \(B\) and getting rejected.

Probability \(P(A, B)\)

Now, we evaluate the integral over \(x\) \begin{align} P(A, B) = \int_{x \in A} \pi(x) T(B \given x) \,\mathrm dx. \end{align}

To do so, we rewrite the integral as a sum over \(k \in \mathcal K\) and an integral over \(\theta_k \in A_k\) (where, like above, we define \(A_k := \{\theta: (k, \theta) \in A\}\)): \begin{align} P(A, B) = \sum_{k \in \mathcal K} \int_{\theta_k \in A_k} \pi(x) \sum_{m \in \mathcal M_k} j(m \given x) \left[ \cdots \right] \,\mathrm d\theta_k, \end{align} where the terms inside the expression inside the square brackets is the same as the expression in the square brackets in \eqref{transition_to_B}. Since terms inside \(\sum_{k \in \mathcal K} \int_{\theta_k \in A_k}\) have \(k, \theta_k\) are “in scope”,

We can decompose \(P(A, B)\) into two terms \begin{align} P(A, B) = P_1(A, B) + P_2(A, B) \end{align} corresponding to the two terms inside the square brackets: \begin{align} P_1(A, B) &= \sum_{k \in \mathcal K} \int_{\theta_k \in A_k} \pi(x) \sum_{m \in \mathcal M_k} j(m \given x) \int_{\bar \theta_{\bar k} \in B_{\bar k}} g_m(u) \alpha_{m, k}(\theta_k, u) \,\mathrm du \,\mathrm d\theta_k \\
&= \sum_{k \in \mathcal K} \sum_{m \in \mathcal M_k} \iint_{\theta_k \in A_k \\\ \bar \theta_{\bar k} \in B_{\bar k}} \pi(x) j(m \given x) g_m(u) \alpha_{m, k}(\theta_k, u) \,\mathrm du \,\mathrm d\theta_k \\
&= \sum_{k \in \mathcal K} \sum_{m \in \mathcal M_k} \iint_{\theta_k \in A_k \\\ u \in \{u: h_m(\theta_k, u)_1 \in B_{\bar k} \} } \pi(x) j(m \given x) g_m(u) \alpha_{m, k}(\theta_k, u) \,\mathrm du \,\mathrm d\theta_k, \label{P1} \\
P_2(A, B) &= \sum_{k \in \mathcal K} \int_{\theta_k \in A_k} \pi(x) \sum_{m \in \mathcal M_k} j(m \given x) \left[ \mathbb 1(\theta_k \in B_k) \int_{u \in \mathcal U_k} g_m(u) (1 - \alpha_{m, k}(\theta_k, u)) \,\mathrm du \right] \,\mathrm d\theta_k \\
&= \sum_{k \in \mathcal K} \sum_{m \in \mathcal M_k} \iint_{\theta_k \in A_k \cap B_k \\\ u \in \mathcal U_k} \pi(x) j(m \given x) g_m(u) (1 - \alpha_{m, k}(\theta_k, u)) \,\mathrm du \,\mathrm d\theta_k. \label{P2} \end{align}

Verifying that \eqref{rjmcmc_integrated_db} implies \eqref{integrated_db}

We now have expressions for both the LHS and the RHS of \eqref{integrated_db}: \begin{align} \text{LHS of \eqref{integrated_db}} &= P(A, B) = P_1(A, B) + P_2(A, B) \\
\text{RHS of \eqref{integrated_db}} &= P(B, A) = P_1(B, A) + P_2(B, A). \end{align} Since \(A, B\) only appears as \(A_k \cap B_k\) in \(P_2(A, B)\) \eqref{P2}, \(P_2(A, B) = P_2(B, A)\). Therefore, we only need \(P_1(A, B) = P_1(B, A)\).

For every summand \((m, k)\) in the double sum in \(P_1(A, B)\) \eqref{P1}, we can show that there is a one-to-one correspondence with a summand \((m, \bar k)\) in \(P_1(B, A)\) where \(m = (\{(k, g_m, h_m), (\bar k, \bar g_m, \bar h_m)\}, d_m)\) is the same but \(\bar k\) is the other model in \(m\). The \((m, \bar k)\) summand of \(P_1(B, A)\) which can be shown via a similar process as above to be \begin{align} \iint_{\bar \theta_{\bar k} \in B_{\bar k} \\\ \bar u \in \{\bar u: \bar h_m(\bar \theta_{\bar k}, \bar u)_1 \in A_k \} } \pi(\bar x) j(m \given \bar x) \bar g_m(\bar u) \alpha_{m, \bar k}(\bar \theta_{\bar k}, \bar u) \,\mathrm d\bar u \,\mathrm d\bar \theta_{\bar k} \label{P2_summand} \end{align} where \(\bar x = (\bar k, \bar \theta_{\bar k})\). We only need the \((m, k)\) summand of \(P_1(A, B)\) to be equal to \eqref{P2_summand} which is the integrated detailed balance for RJMCMC \eqref{rjmcmc_integrated_db}. Hence integrated detailed balance for RJMCMC \eqref{rjmcmc_integrated_db} implies integrated detailed balance \eqref{integrated_db}.

[back]