gradient flows

Langevin dynamics connects probability theory and optimization through its formulation as a gradient flow in the Wasserstein-2 space. The time evolution of the Fokker-Planck equation\cite{jordan_variational_1998} can be characterized as a gradient flow in the wasserstein space $\mathcal{P}(\mathbb{R}^{n})$, minimizing relative entropy $KL(p_t || \pi)$ defined as: $$ \frac{\partial p_t}{\partial t} = - \nabla (p_t F) + \Delta (p_t) $$

where $p_t$ is the probability density at time $t$ and $F = - \nabla f$ represents drift and $\Delta(p_t)$ is the diffusion term. Now, the relative entropy decreases over time due to entropy dissipation, given by: $$ \frac{d}{d t} \mathrm{KL}\left(p_t | \pi\right)=-\int \frac{\left|\nabla p_t(x)\right|^2}{p_t(x)} d x $$

Gradient flow in the Wasserstein metric $W_2$ can be expressed by the functional inequality: $$ H_v(p)=\frac{1}{2 p} \int \frac{|\nabla p(x)|^2}{p(x)} d x, $$ where $H_2(p)$ represents Fisher information relative to the target measure $v$. This structure forms the basis for interpreting Langevin dynamics as the steepest descent flow in $\mathcal{P}\left(\mathbf{R}^n\right)$ towards $\pi$.