Tuan Anh Le

Gaussian unknown mean

04 September 2016

Consider the following generative model for \(N\) \(D\)-dimensional data points \(x_{1:N}\): \begin{align} \mu &\sim \mathrm{Normal}(\mu_0, \Sigma_0), \\
x_n \mid \mu &\sim \mathrm{Normal}(\mu, \Sigma), && n = 1, \dotsc, N, \end{align} where \(\mu_0 \in \mathbb R^D, \Sigma_0 \in \mathbb R^{D \times D}\) are the prior mean and covariances and \(\Sigma \in \mathbb R^{D \times D}\) is the data covariance.

The joint

Then the joint density of \(x_{1:N}\) and \(\mu\) can be expressed as \begin{align} p(x_{1:N}, \mu) &= p(\mu) p(x_{1:N} \mid \mu) \\
&= \frac{1}{Z_1} \exp\left(\frac{1}{2}(\mu - \mu_0)^T \Sigma_0^{-1} (\mu - \mu_0)\right) \cdot \prod_{n = 1}^N \frac{1}{Z_2} \exp\left(\frac{1}{2}(x_n - \mu)^T \Sigma^{-1} (x_n - \mu)\right) \\
&= \frac{1}{Z_1 Z_2} \exp\left(\frac{1}{2}\left(\mu^T \Sigma_0^{-1} \mu - 2\mu^T\Sigma_0^{-1}\mu_0 + \mu_0^T\Sigma_0^{-1}\mu_0 + \sum_{n = 1}^N x_n^T \Sigma^{-1} x_n - 2x_n^T\Sigma^{-1}\mu + \mu^T\Sigma^{-1}\mu\right)\right) \\
&= \frac{1}{Z_1 Z_2 Z_3} \exp\left(\frac{1}{2}\left(\mu^T(\Sigma_0^{-1} + N\Sigma^{-1})\mu - 2\mu^T\left(\Sigma_0^{-1}\mu_0 + \sum_{n = 1}^N \Sigma^{-1}x_n\right) \right)\right), \end{align} where \(Z_1, Z_2\) are normalisation constants of the prior and likelihood normal densities, and \(Z_3\) is a constant with respect to \(\mu\).

The posterior

The posterior density of \(\mu\) has the form of \begin{align} p(\mu \mid x_{1:N}) &= \frac{p(x_{1:N}, \mu)}{p(x_{1:N})} \\
&= \frac{1}{Z_4} p(x_{1:N}, \mu) \\
&= \frac{1}{Z_1 Z_2 Z_3 Z_4} \exp\left(\frac{1}{2}\left(\mu^T(\Sigma_0^{-1} + N\Sigma^{-1})\mu - 2\mu^T\left(\Sigma_0^{-1}\mu_0 + \sum_{n = 1}^N \Sigma^{-1}x_n\right) \right)\right). \label{eq:gaussian/quadratic} \end{align} We can see that this term has the form \(\exp(\mu^T A \mu + \mu^T B + C)\) for some \(A \in \mathbb R^{D \times D}\) positive semidefinite, \(B \in \mathbb R^D\), and \(C \in \mathbb R\). It must also integrate to one, i.e. \(\int p(\mu \mid x_{1:N}) \,\mathrm d\mu = 1\). Since this is a known form of a normal distribution, we can fit the parameters of the presupposed posterior normal density with posterior mean \(\mu_N\) and posterior covariance \(\Sigma_N\) with quantities in \eqref{eq:gaussian/quadratic} as follows: \begin{align} p(\mu \mid x_{1:N}) &= \mathrm{Normal}(\mu; \mu_N, \Sigma_N) \\
&= \frac{1}{Z_5} \exp\left(\frac{1}{2}\left(\mu^T \Sigma_N^{-1} \mu - 2\mu^T\Sigma_N^{-1}\mu_N + \mu_N^T\Sigma_N^{-1}\mu_N\right)\right) \\
\implies \Sigma_N &= (\Sigma_0^{-1} + N\Sigma^{-1})^{-1} \\
\mu_N &= \Sigma_N \left(\Sigma_0^{-1}\mu_0 + \sum_{n = 1}^N \Sigma^{-1}x_n\right). \end{align}

The marginal likelihood

The marginal likelihood of this model is \begin{align} p(x_{1:N}) &= \mathrm{Normal}\left( x_{1:N}; \begin{bmatrix} \mu_0 \\
\vdots \\
\mu_0 \end{bmatrix}, \begin{bmatrix} \Sigma & & \\
& \ddots & \\
& & \Sigma \end{bmatrix} + \begin{bmatrix} \Sigma_0 & \cdots & \Sigma_0 \\
\vdots & \ddots & \vdots \\
\Sigma_0 & \cdots & \Sigma_0 \end{bmatrix} \right). \end{align} This can be derived using the Gaussian identity in equation 5 in Michael Osborneā€™s note where we use the following substitutions: \begin{align} \color{blue}{x} &\leftarrow \mu, \\
\color{blue}{y} &\leftarrow x_{1:N}, \\
\color{blue}{\mu} &\leftarrow \mu_0, \\
\color{blue}{A} &\leftarrow \Sigma_0, \\
\color{blue}{M} &\leftarrow \begin{bmatrix} I \\
\vdots \\
I \end{bmatrix}, \\
\color{blue}{c} &\leftarrow 0, \\
\color{blue}{L} &\leftarrow \begin{bmatrix} \Sigma & & \\
& \ddots & \\
& & \Sigma \end{bmatrix}. \end{align}