The Langevin diffusion and the Unadjusted Langevin Algorithm (ULA) can be interpreted as gradient flow dynamics minimizing the relative entropy (KL divergence) over the space of probability measures \cite{jordan_variational_1998}. Specifically, the marginal distribution of the Langevin diffusion process $(X_t){t \geq 0}$ evolves along the gradient flow in the Wasserstein space, minimizing $D{\mathrm{KL}}(\cdot | \pi)$ \cite{jordan_variational_1998, villani_optimal_2009}.

The space of probability measures on $\mathbb{R}^d$ with finite second moments is denoted as: $$ \mathcal{P}2\left(\mathbb{R}^d\right) = \left{\mu : \mathbb{R}^d \to \mathbb{R}{\geq 0} \mid \int_{\mathbb{R}^d} \mu(x) , dx = 1 \right}. $$ The distance between two measures $\mu, \nu \in \mathcal{P}2\left(\mathbb{R}^d\right)$ is defined by the Wasserstein-2 distance \cite{villani_optimal_2009}: $$ W_2(\mu, \nu) = \left( \inf{\Pi} \int d^2(x, y) , d\Pi(x, y) \right)^{\frac{1}{2}}, $$ subject to the marginal constraints: $$ \int \Pi(x, y) , dy = \mu(x), \quad \int \Pi(x, y) , dx = \nu(y), $$ where $d(x, y)$ is the Euclidean distance, and $\Pi$ is a joint distribution satisfying these constraints.

For a target distribution $\pi$ and a current distribution $\rho \in \mathcal{P}2\left(\mathbb{R}^d\right)$, the gradient of the KL divergence with respect to $\rho$ is given by: $$ \nabla\rho D_{\mathrm{KL}}(\rho | \pi) = \nabla \cdot [\log \pi(x) \rho(x)] - \Delta \rho(x). $$

A flow $\rho_t$ in $\mathcal{P}_2\left(\mathbb{R}^d\right)$ that satisfies: $$ \frac{\partial}{\partial t} \rho^t(x) = -\nabla \cdot \left[\log \pi(x) \rho^t(x)\right] + \Delta \rho^t(x), $$ is called the Wasserstein Gradient Flow (WGF), which minimizes the KL divergence with respect to the target distribution $\pi$.

For an in-depth discussion on gradient flows, refer to the survey by Santambrogio: Santambrogio, Filippo. “Euclidean, metric, and Wasserstein gradient flows: an overview.” \textit{Bulletin of Mathematical Sciences}, 7.1 (2017): 87–154.