Stein Variational Gradient Descent
I created some introductory slides about the Stein Variational Gradient Descent paper written by Q. Liu et al. and wanted to share some of the main insights:
General setting
Often, in probabilistic machine learning we are interested in finding the posterior density of a Bayesian experiment such that
\[p(\boldsymbol{\theta}) = \frac{p(\boldsymbol{X} \vert \boldsymbol{\theta})p_0(\boldsymbol{\theta})}{p(\boldsymbol{X})}\]with smooth likelihood \(p(\boldsymbol{X}\vert \boldsymbol{\theta})\), smooth prior \(p_0(\boldsymbol{\theta})\) and marginal likelihood \(p(\boldsymbol{X})\) which is in general intractable. This is, e.g., the case for most Bayesian neural networks. In general, the marginal likelihood \(p(\boldsymbol{X})\) (and hence \(p(\boldsymbol{\theta})\)) is analytically intractable.
Score gradient
Assuming that the joint density is sufficiently smooth, we can obtain an analytical expression of the gradient of log posterior density, the so-called score gradient,
\[\nabla_{\boldsymbol{\theta}} \log p(\boldsymbol{\theta}) = \nabla_{\boldsymbol{\theta}}\log [p(\boldsymbol{X} \vert \boldsymbol{\theta})p_0(\boldsymbol{\theta})] - \nabla_{\boldsymbol{\theta}}\log p(\boldsymbol{X}) = \nabla_{\boldsymbol{\theta}}\log [p(\boldsymbol{X} \vert \boldsymbol{\theta})p_0(\boldsymbol{\theta})].\]