Embedded Manifold

is a smooth manifold that sits inside some ambient euclidean space $\mathbb{R}^n$, with the subspace topology and differential structure inherited from $\mathbb{R}^n$. It is the smooth injective immersion which is also a topological embedding.

$\mathcal{M} \subset \mathbb{R}^n$ has a riemannian metric $g$ naturally induced by the euclidean inner product $\langle . ,,. \rangle$ on $\mathbb{R}^{n}$. Additionally, $$ g_x (u, v):=\langle u, v \rangle \mathbb{R}^n $$ for $u, v \in T_xM$

submanifold

is any smooth manifold that is locally diffeomorphic to $\mathbb{R}^n$ and may or may not be embedded. $\mathbb{S}^{d-1}$= ${x \in \mathbb{R}^d : \left \lVert x \right \rVert = 1}$ is an embedded $d-1$ dimensional submanifold of $\mathbb{R}^{d}$

Riemannian Product Manifold

$S^{(d-1) m}=\underbrace{S^{d-1} \times S^{d-1}}{m \text { times }}$. Here we have a matrix $W \in \mathbb{R}^{m\times d}$ where each row lies on $S^{d-1}$. The tangent space at $(x_1, \ldots, x_m) \in (\mathbb{S}^{d-1})^m$ is given by: $$ T{(x_1, \ldots, x_m)}(\mathbb{S}^{d-1})^m = T_{x_1}\mathbb{S}^{d-1} \oplus \cdots \oplus T_{x_m}\mathbb{S}^{d-1}. $$

Riemannian Gradient

Let $f : \mathcal{M} \to \mathbb{R}$ be a smooth function, for $\mathcal{M} \subset \mathbb{R}^n$. Now, the Euclidean gradient is $\nabla_{\mathbb{R}^n} f(x) \in \mathbb{R}^n$. Generally, $\nabla_{\mathbb{R}^n} f(x) \notin T_x \mathcal{M}$. The Riemannian gradient $\nabla_{\mathcal{M}} f(x)$ is defined as the orthogonal projection of the Euclidean gradient onto the tangent space, $\nabla_{\mathcal{M}} f(x) = \text{Proj}{T_x \mathcal{M}} \nabla{\mathbb{R}^n} f(x).$

Riemannian Gradient

on the sphere $\mathbb{S}^{d-1}$ at $x$} is: $$ \nabla_{\mathbb{S}^{d-1}} f(x) = \nabla_{\mathbb{R}^d} f(x) - \langle \nabla_{\mathbb{R}^d} f(x), x \rangle x. $$

Tangent Space on the sphere $\mathbb{S}^{d-1}$ at $x$

$$ T_x \mathbb{S}^{d-1} = { v \in \mathbb{R}^d : \langle v, x \rangle = 0 }, $$

Alternate representation of Layernorm

Let $W \in \mathbb{R}^{m \times d}$. Each row of $W$ is invariant to scaling due to LayerNorm, which means the true parameter space has a scale invariance. Thus, we may model $W$ as lying on: $W \in \mathcal{W} \simeq (\mathbb{S}^{d-1})^m$ where the ambient representation space is $\mathcal{W} := \left( \mathbb{R}^d \setminus {0} \right)^m / \mathbb{R}_+^m.$

$\mathcal{W}$ is the quotient manifold induced by row wise scale invariance, it is also the “abstract parameter manifold” in some sense for sampling.

Geodesic convexity

assuming $f : \mathcal{M} \to \mathbb{R}$ is smooth, $f$ is \emph{geodesically convex} if for all $x, y \in \mathcal{M}$ and all minimizing geodesics $\gamma : [0,1] \to \mathcal{M}$ with $\gamma(0) = x$, $\gamma(1) = y$, we have $$ f(\gamma(t)) \leq (1-t)f(x) + t f(y), \quad \forall t \in [0,1]. $$

strongly geodesically convex

$f$ is strongly geodesically convex with parameter $\lambda > 0$ if $$ f(\gamma(t)) \leq (1-t)f(x) + t f(y) - \frac{\lambda}{2}(1-t)t, d(x, y)^2. $$

Riemannian Hessian

at a point $x \in \mathcal{M}$ is the bilinear form: $$ \text{Hess}{\mathcal{M}} f(x)(v, v) = \left.\frac{d^2}{dt^2} f(\gamma(t)) \right|{t=0}, $$ where $\gamma$ is a geodesic with $\gamma(0) = x$ and $\dot{\gamma}(0) = v \in T_x \mathcal{M}$.

$\lambda$-strongly geodesically convex

If $\text{Hess}_{\mathcal{M}} f(x) \succeq \lambda , \mathrm{Id}$ for all $x$, then $f$ is $\lambda$-strongly geodesically convex.

Bakry-Émery criterion

Let $\mu(dx) = e^{-V(x)} , dx$ be a smooth measure on $\mathcal{M}$, and assume $$ \text{Hess}_{\mathcal{M}} V(x) \succeq \rho , \mathrm{Id}. $$ Then $\mu$ satisfies the log-Sobolev inequality (LSI) with parameter $\rho$.

(In our case: $\text{Hess}_{\mathcal{M}} V(W) \succeq \rho , \mathrm{Id}$ on $\mathcal{M} = (\mathbb{S}^{d-1})^m$.)