
Gaussian Process Update Equations
When you start to learn about Gaussian processes, you come across the update equations fairly early on. The update equations are fairly intimidating to look at, and are typically dismissed as trivial to derive (for example, Rasmussen and Williams simply point you towards a statistics book from the 1930’s, which is neither available online nor in our university library…). I had a go at the derivation, and promptly realised it wasn’t trivial at all from a cold start.
Fortuitously, at the PEN emulators workshop I recently attended, there was an introductory lecture from Jochen Voss, where he went through a univariate case, and then gave us the structure of the derivation for the full multivariate case. Luckily, this gave me the push I needed to try the derivation again, so I went away and filled in the gaps.
So here it is, in all it’s glory; the derivation of the update equations for Gaussian processes.
The overall endgame of Gaussian process regression is to write down a conditional distribution
Since we have a constant set of data,
The underlying assumption in Gaussian process regression is that outputs are jointly Gaussian distributed, so that
Where
Where
Where
However, the only thing that isn’t a constant here is
If we take the transpose of the middle term, we can group the terms together a bit more
Now, in general, a multivariate Gaussian has the form;
If we remember that covariance matrices are symmetric, we can expand, drop some constant terms and then rearrange this to
we can therefore see that both
The expression for
We can rearrange this a little bit
We know that
And, so, in conclusion we know that
So, not quite as trivial as the textbooks claim!