some literature review

\cite{dalalyan2017} and \cite{durmus_non-asymptotic_2016} established error bounds for ULA based on step size constraints.

\cite{eberle_couplings_2017} reflection and synchronous couplings to quantify contraction rates in Wasserstein distance: $$ \mathcal{W}_2^2(X_t, Y_t) \leq e^{-\lambda t} \mathcal{W}_2^2(X_0, Y_0). $$

Leimkuhler et al. (2023) extended these results to discretized kinetic Langevin dynamics, deriving precise step-size constraints for accelerated convergence.

\cite{ma_is_2019} extended some of eberle’s results using hypocoercivity frameworks to underdamped dynamics as well. Ma et al. (2019) proved that for $m$-strongly convex and $L$-smooth potentials, underdamped Langevin dynamics converges in Wasserstein distance: $$ \mathcal{W}_2\left(v_k, \pi\right) \leq \mathcal{O}\left(\exp \left(-\sqrt{\frac{m}{L}} k\right)\right) $$ matching the accelerated convergence rate of Nesterov’s gradient descent.

In context of composite optimization \cite{freund2022convergence} and other works led to dimension-free bounds.

mirror langevin dynamics and proximal methods. todo check this.

Lyapunov functions and intermediate functional inequalities provide additional tools for analyzing convergence. Bakry, Cattiaux, and Guillin (2008) combined Lyapunov conditions with PI to extend convergence guarantees under weaker convexity assumptions. For example, Lyapunov functions $V(x)$ satisfy: $$ \mathcal{L} V(x) \leq-\lambda V(x)+C $$ where $\mathcal{L}$ is the generator of the Markov process. This ensures exponential convergence: $$ \mathbb{E}\left[V\left(\mathrm{X}_{\mathrm{t}}\right)\right] \leq V\left(\mathrm{X}_0\right) e^{-\lambda t} $$

In \cite{bobkov2003spectral} Bobkov proved that log-concave and spherically symmetric measures satisfy Poincaré inequality with explicit bounds on the spectral gap. for a probability measure $\mu$ on $\mathbb{R}^n$ satisfying rotational symmetry and log-concavity, Bobkov showed the existence of a Poincaré constant $\lambda > 0$ st : \begin{equation} \text{Var}\mu(f) \leq \frac{1}{\lambda} \int |\nabla f|^2 d\mu, \end{equation} where $\text{Var}\mu(f)$ denotes the variance of $f$ under $\mu$.

in section 2.2 of Chewi et al.~\cite{chewi2022poincare} they rely on Bobkov’s results. when target satisfies PI to upperbound the mixing time of LMC LMC for targets with a spectral gap, they establish the following bound on the Wasserstein-2 distance $W_2$ between the iterates $\mu_t$ of LMC and the target distribution $\pi$: \begin{equation} W_2(\mu_t, \pi) \leq C e^{-\lambda t}, \end{equation}

\cite{nongeodesicallyconvex_luu_2024} study a class of optimization problems in the Wasserstein space where the objective function is nonconvex along generalized geodesics. the potential function has difference-of-convex (DC) structure. The paper introduces a semi Forward-Backward Euler scheme

In W-2 classical notions of convexity are generalized via geodesic convexity: $$ \mu_t = ((1-t)T_0 + tT_1)_{#} \mu_0, \quad t \in [0,1], $$ where $T_0$ and $T_1$ are transport maps interpolating between two measures $\mu_0$ and $\mu_1$.In cases where the objective function $ F \colon \mathcal{P}_2 \to \mathbb{R} $ fails to be geodesically convex but exhibits a difference-of-convex (DC) structure: $$ F(\mu) = F_1(\mu) - F_2(\mu), $$ where both $F_1$ and $F_2$ are geodesically convex along generalized geodesics.

Let $F(\mu)$ be the objective function with gradient flow given by: $$ \partial_t \mu_t = -\nabla_{\mathcal{W}2} F(\mu_t), $$ where $\nabla{\mathcal{W}2}$ denotes the Wasserstein gradient. The SFBE scheme is defined as a modification of the classical Forward-Backward Euler scheme: $$ \mu{k+1} = \mathrm{prox}{\tau F_2} \left( \mu_k - \tau \nabla{\mathcal{W}2} F_1(\mu_k) \right), $$ where $\tau > 0$ is the step size, and $\mathrm{prox}{\tau F_2}$ denotes the proximal operator of $F_2$ in the Wasserstein space. The DC structure is split into two components: one handled by explicit gradient descent (forward step) and the other by implicit proximal updates (backward step).

they prove convergence of $ \mu_k $ to a stationary measure $ \mu^* $ when $ F_2 $ satisfies a proximal regularity condition, and results hold even when $F$ lacks geodesic convexity.

based on this when the target distribution $\pi(x) \propto \exp(-U(x))$ exhibits nonconvex potentials, $U(x)$ can often be decomposed as a difference of convex functions: $$ U(x) = U_1(x) - U_2(x), $$ where $U_1$ and $U_2$ are convex

The Mirror Langevin dynamics generalize the overdamped Langevin diffusion to a mirror descent framework by introducing a divergence function $ D_\phi $, where $ \phi $ is a Legendre function inducing the mirror map: $$ \nabla \phi^: \mathcal{X} \to \mathcal{X}^. $$ The corresponding dynamics evolve as: $$ dX_t = -\nabla \phi^(\nabla U(X_t)) dt + \sqrt{2} dB_t, $$ where $ U \colon \mathcal{X} \to \mathbb{R} $ is the potential function, $ B_t $ is the Brownian motion, and $ \nabla \phi^ $ maps gradients into the dual space.

the metric is defined by the Hessian of the Legendre function $ \phi $: $$ \langle u, v \rangle_\phi = \langle \nabla^2 \phi(x) u, v \rangle. $$

Under weak smoothness they get a rate depending on the stepsize $ h $ and the functional inequality constant $ C_{\mathrm{LSI}} $: $$ W_2(\mu_k, \pi) \leq \mathcal{O}\left( \exp(-c k h) + h^{p} \right), $$ it is based on the mirror descent algorithm in optimization (Nemirovski, 1983).

Durmus and Moulines (2017) derive non-asymptotic convergence rates for ULA under constant and decreasing step-sizes.

It provides the first comprehensive non-asymptotic analysis of ULA in both total variation and Wasserstein distances, bridging the gap between asymptotic ergodicity results and practical convergence rates. one of the first ones to use coupling arguments

improved discretization paper

For a potential $U$ satisfying $L$-smoothness ($|\nabla U(x) - \nabla U(y)| \leq L |x - y|$), the authors show:

KL divergence guarantee under log-Sobolev inequality (LSI): $$ \mathrm{KL}(\nu_k | \pi) \leq \mathcal{O}\left(\frac{\kappa^{3/2} \sqrt{d}}{\varepsilon}\right), $$ where $\kappa = L/m$ is the condition number and $\varepsilon$ is the error tolerance. Notably, no Lipschitz Hessian assumption is required.
TV distance guarantee under Poincar’e inequality (PI): $$ |\nu_k - \pi|{\mathrm{TV}} \leq \mathcal{O}\left(\frac{C{LSI}^{3/2} L^{3/2} \sqrt{d}}{\varepsilon}\right), $$ with state-of-the-art dimension dependence $\sqrt{d}$.
R’enyi divergence guarantees for $q \in [1, 2)$: $$ R_q(\nu_k | \pi) \leq \mathcal{O}\left(\frac{C_{PI} L^2 d}{\varepsilon}\right), $$ improving upon prior results for weakly smooth potentials ($s$-Hölder continuous gradient, $s \in (0, 1]$).

based on Cao, Lu, and Wang (2020), ULMC achieves linear dependence on the condition number $\kappa$ in log-concave settings: $$ \mathcal{W}_2(\nu_k, \pi) \leq \mathcal{O}\left(e^{-c k \sqrt{m/L}}\right), $$ where $c > 0$ is a constant depending on $\gamma, L, m$. Although full acceleration ($\mathcal{O}(\kappa^{1/2})$) remains open, these results provide progress in discrete-time settings.

relies on Girsanov’s theorem to derive R’enyi divergence bounds for ULMC, avoiding Wasserstein coupling arguments. their analysis extends to weakly smooth potentials and handles the non-reversibility of ULMC dynamics.

For $s$-Hölder continuous $\nabla U$, satisfying $|\nabla U(x) - \nabla U(y)| \leq L |x - y|^s$, the iteration complexity in KL divergence becomes: $$ \mathcal{O}\left(\frac{L^{1/s} d^{1/s}}{\varepsilon}\right), $$

Under LSI ($\mathrm{KL}(\rho | \pi) \leq C_{LSI} \mathbb{E}[|\nabla \log (\rho / \pi)|^2]$), ULMC achieves: $$ \mathrm{KL}(\nu_k | \pi) \leq \mathcal{O}\left(\frac{1}{k^{2/3}}\right), $$ while under PI ($\mathrm{Var}\pi(g) \leq C{PI} \mathbb{E}_\pi[|\nabla g|^2]$), convergence in Wasserstein-2 distance satisfies: $$ \mathcal{W}2(\nu_k, \pi) \leq \mathcal{O}\left(e^{-C{PI} k}\right). $$