key results of wibisono 2018

Lemma 2: Contraction of ULA Bias

Under strong logconcavity, the Wasserstein distance between the biased limit of ULA and $\nu$ is bounded by terms proportional to $\sqrt{\epsilon}$, with $\epsilon$ being the discretization parameter.

Lemma 3: Fixed Point Condition for FB Algorithm

If the covariances $\Sigma_k$ and $\Sigma$ commute, then the FB algorithm converges to a unique fixed point in covariance, ensuring that $\Sigma_k \rightarrow \Sigma$ as $k \rightarrow \infty$.

Worked out Example

for the FB algorithm, consider Example 8 from appendix G, which involves sampling from an Ornstein-Uhlenbeck (OU) process with Gaussian data. Let the target distribution be $\nu=\mathcal{N}(\mu, \Sigma)$ and initialize with $\rho_0=\mathcal{N}\left(\mu_0, \Sigma_0\right)$, setting $\Sigma_0=I$ so $\Sigma_0$ commutes with $\Sigma$. Each distribution $\rho_k$ remains Gaussian with mean $\mu_k$ and covariance $\Sigma_k$.

$\nu$ is normal, so $$ p_\nu(x)=\frac{1}{(2 \pi)^{d / 2}|\Sigma|^{1 / 2}} \exp \left(-\frac{1}{2}(x-\mu)^T \Sigma^{-1}(x-\mu)\right) $$ $$ -\log p_\nu(x)=-\log \left(\frac{1}{(2 \pi)^{d / 2}|\Sigma|^{1 / 2}} \exp \left(-\frac{1}{2}(x-\mu)^T \Sigma^{-1}(x-\mu)\right)\right) $$ $$ -\log p_\nu(x)=\frac{d}{2} \log (2 \pi)+\frac{1}{2} \log |\Sigma|+\frac{1}{2}(x-\mu)^T \Sigma^{-1}(x-\mu) $$ ignoring constants, the negative log-density of $\nu$ is: $$ f(x) = \frac{1}{2} (x - \mu)^T \Sigma^{-1} (x - \mu). $$ $$ \nabla f(x) = \Sigma^{-1}(x - \mu). $$

forward step

The FB algorithm applies a gradient descent step on $f(x)=\frac{1}{2}(x-\mu)^T \Sigma^{-1}(x-$ $\mu)$, the forward step of FB with step size $\epsilon$ is: $$ x_{k+\frac{1}{2}} = x_k - \epsilon \nabla f(x_k) = x_k - \epsilon \Sigma^{-1}(x_k - \mu). $$ rewriting this we get $$ x_{k+\frac{1}{2}} = \mu + (I - \epsilon \Sigma^{-1})(x_k - \mu). $$ as $x_k \sim \mathcal{N}(\mu_k, \Sigma_k)$, and forward update for the mean is: $$ \mu_{k+\frac{1}{2}} = \mu + (I - \epsilon \Sigma^{-1})(\mu_k - \mu). $$

backward step

The backward step applies a proximal update to the covariance, requiring that the resulting $x_{k+1} \sim \mathcal{N}\left(\mu_{k+1}, \Sigma_{k+1}\right)$ satisfies: $$ x_{k+1}=\mu_{k+1}+\left(I-\epsilon \Sigma_{k+1}^{-1}\right)^{-1}\left(x_{k+\frac{1}{2}}-\mu_{k+1}\right) $$

mean and covariance update

also to ensure $x_{k+1} \sim \mathcal{N}(\mu_{k+1}, \Sigma_{k+1})$, the covariance consistency equation is used for the following condition on $\Sigma_{k+1}$: $$ \Sigma_{k+1} (I - \epsilon \Sigma_{k+1}^{-1})^2 = \Sigma_k (I - \epsilon \Sigma^{-1})^2. $$ Now, the only fixed point of the covariance update equation is $\Sigma$ itself. so as $k \to \infty$, $\Sigma_k \to \Sigma$, guaranteeing convergence of $\rho_k$ to $\nu$. $$ \Sigma = \Sigma (I - \epsilon \Sigma^{-1})^2, $$

mean update across steps: $$ \mu_{k+1} = \mu + (I - \epsilon \Sigma^{-1})(\mu_k - \mu) = \mu + (I - \epsilon \Sigma^{-1})^k (\mu_0 - \mu), $$

which converges exponentially to $\mu$. For the covariance, the equation above shows that $\Sigma_k \rightarrow \Sigma$ as $k \rightarrow \infty$ under the condition $\epsilon \leq \lambda_{\min }(\Sigma)$.