0% found this document useful (0 votes)
54 views

Gradient Flow

The document discusses gradient flow in the Euclidean space. It defines gradient flow as a smooth curve where the derivative is equal to the negative gradient of a function. It examines variants of gradient flow including when the function is convex or semi-convex. It also discusses approximating gradient flow curves using the minimizing movement scheme, where the curve is approximated as the solution of an optimization problem with a quadratic penalty term. It proves that as the time step goes to zero, the approximated curves converge uniformly to the true gradient flow curve.

Uploaded by

redditor1276
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Gradient Flow

The document discusses gradient flow in the Euclidean space. It defines gradient flow as a smooth curve where the derivative is equal to the negative gradient of a function. It examines variants of gradient flow including when the function is convex or semi-convex. It also discusses approximating gradient flow curves using the minimizing movement scheme, where the curve is approximated as the solution of an optimization problem with a quadratic penalty term. It proves that as the time step goes to zero, the approximated curves converge uniformly to the true gradient flow curve.

Uploaded by

redditor1276
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 91

Gradient Flow

Chang Liu

Tsinghua University

April 24, 2017

Chang Liu (THU) Gradient Flow April 24, 2017 1 / 91


Contents
1 Introduction
2 Gradient flow in the Euclidean space
Variants of Gradient Flow in the Euclidean Space
Approximating Curves
Characterizing Properties
3 Gradient Flow in Metric Spaces
Generalization of Basic Concepts
Generalization of Gradient Flow to Metric Spaces
4 Gradient Flows on Wasserstein Spaces
Recap. of Optimal Transport Problems
The Wasserstein Space
Gradient Flows on W2 (Ω), Ω ⊂ Rn
Numerical methods from the JKO scheme
5 Application
6 My Remarks
7 Appendix
Chang Liu (THU) Gradient Flow April 24, 2017 2 / 91
Introduction

Introduction

Chang Liu (THU) Gradient Flow April 24, 2017 3 / 91


Introduction

Introduction

Definition (Gradient Flow in Linear Space)


X is a linear space, and F : X → R is smooth. Gradient flow (or steepest
descent curve) is a smooth curve x : R → X such that

x 0 (t) = −∇F (x(t)).

What shall we consider next and where can it be applied?


1 Existence and uniqueness of the solution
Since many PDEs are in the form of a gradient flow, the analysis can be
applied to them.
Example
For X = LR2 (Rn ), a Hilbert space, and for Dirichlet energy
F (u) = 12 |∇u(x)|2 dx, the Heat Equation ∂t u = ∇2 u is a gradient flow
problem.
Chang Liu (THU) Gradient Flow April 24, 2017 4 / 91
Introduction

Introduction

Definition (Gradient Flow in Linear Space)


X is a linear space, and F : X → R is smooth. Gradient flow (or steepest
descent curve) is a smooth curve x : R → X such that

x 0 (t) = −∇F (x(t)).

What shall we consider next and where can it be applied?


2 Numerical methods and their convergence
Since gradient flow gradually minimizes F (x), so many optimization
methods are related to it, e.g. gradient descent, proximal descent
methods, mirror descent.

Chang Liu (THU) Gradient Flow April 24, 2017 5 / 91


Introduction

Introduction

What shall we consider next and where can it be applied?


3 Generalization to the gradient flow on general metric space.
The need of viewing PDEs as gradient flows on general metric spaces,
thus wider applicability.
Example
PDEs in the continuity equation form ∂t ρ − ∇ · (ρv ) = 0, where v = ∇[δF /δρ], can
be cast as a gradient flow on the space of probabilities with Wasserstein distance.
Heat Equation can also be viewed as a gradient flow in the Wasserstein space.

The need of minimizing functionals on metric space.


Example
Optimization w.r.t. probability distributions, e.g. minq KL(q||p). Optimization
without parameterization is possible! (e.g. Stein Variational Gradient Descent)

Chang Liu (THU) Gradient Flow April 24, 2017 6 / 91


Gradient flow in the Euclidean space

Gradient Flow in the Euclidean Space

Chang Liu (THU) Gradient Flow April 24, 2017 7 / 91


Gradient flow in the Euclidean space Variants of Gradient Flow in the Euclidean Space

Gradient Flow in the Euclidean Space

Variants of Gradient Flow in the Euclidean Space

Chang Liu (THU) Gradient Flow April 24, 2017 8 / 91


Gradient flow in the Euclidean space Variants of Gradient Flow in the Euclidean Space

Existence, Uniqueness and Variants

Variant 0: F : Rn → R is differentiable (Cauchy Problem):


 0
x (t) = −∇F (x(t)), for t > 0,
x(0) = x0 .

Theorem
∃! solution if ∇F is Lipschitz.

Chang Liu (THU) Gradient Flow April 24, 2017 9 / 91


Gradient flow in the Euclidean space Variants of Gradient Flow in the Euclidean Space

Existence, Uniqueness and Variants

Variant 1: F is convex and unnecessarily differentiable:


 0
x (t) ∈ −∂F (x(t)), for a.e. t > 0,
x(0) = x0 ,

where x is an absolutely continuous curve, and


∂F (x) = {p ∈ Rn : ∀y ∈ Rn , F (y ) ≥ F (x) + p · (y − x)}.
Theorem
Any two solutions x1 , x2 of the above problem with different initial
conditions satisfy |x1 (t) − x2 (t)| ≤ |x1 (0) − x2 (0)|.

Corollary
For a given initial condition, the above problem has one unique solution.

Chang Liu (THU) Gradient Flow April 24, 2017 10 / 91


Gradient flow in the Euclidean space Variants of Gradient Flow in the Euclidean Space

Existence, Uniqueness and Variants

Variant 2: F is semi-convex (λ convex)

Definition (λ-convex function)


F is λ-convex (λ ∈ R) if F (x) − λ2 |x|2 is convex.

x 0 (t) ∈ −∂F (x(t)), for a.e. t > 0,




x(0) = x0 ,
where x is an absolutely continuous curve, and
∂F (x) = {p ∈ Rn : ∀y ∈ Rn , F (y ) ≥ F (x) + p · (y − x) + λ2 |y − x|2 }.

Theorem
Any two solutions x1 , x2 of the above problem with different initial
conditions satisfy |x1 (t) − x2 (t)| ≤ e −λt |x1 (0) − x2 (0)|.

Chang Liu (THU) Gradient Flow April 24, 2017 11 / 91


Gradient flow in the Euclidean space Variants of Gradient Flow in the Euclidean Space

Existence, Uniqueness and Variants

Variant 2: F is semi-convex (λ-convex)


Theorem
Any two solutions x1 , x2 of the above problem with different initial
conditions satisfy |x1 (t) − x2 (t)| ≤ e −λt |x1 (0) − x2 (0)|.

Corollary
For a given initial condition, the above problem has one unique solution.
If λ > 0 (strong convex), F has a unique minimizer x ∗ . x(t) ≡ x ∗ is a
solution, so for any solution x(t), |x(t) − x ∗ | ≤ e −λt |x(0) − x ∗ |.

Chang Liu (THU) Gradient Flow April 24, 2017 12 / 91


Gradient flow in the Euclidean space Approximating Curves

Gradient Flow in the Euclidean Space

Approximating Curves

Chang Liu (THU) Gradient Flow April 24, 2017 13 / 91


Gradient flow in the Euclidean space Approximating Curves

Definition (MMS)
Minimizing Movement Scheme (MMS): for a fixed small time step τ ,
define a sequence {xkτ }k by

τ |x − xkτ |2
xk+1 ∈ arg min F (x) + .
x 2τ
Importance:
Practical numerical method for approximating the curve.
Easier generalization to metric space, than x 0 = −∇F (x) itself.
Properties:
Existence of solution for mild F (e.g. Lipschitz and lower bounded by
C1 − C2 |x|2 ).
τ −x τ
xk+1 k τ ): implicit Euler scheme (more stable but hard
∈ −∂F (xk+1
τ
than explicit one: gradient descent)

Chang Liu (THU) Gradient Flow April 24, 2017 14 / 91


Gradient flow in the Euclidean space Approximating Curves

Convergence:
Define vk+1τ τ
, (xk+1 − xkτ )/τ , and v τ (t) = vk+1
τ , t ∈ (kτ, (k + 1)τ ].

Define two kinds of interpolations:


1) x τ (t) = xkτ , t ∈ (kτ, (k + 1)τ ];
2) x̃ τ (t) = xkτ + (t − kτ )vk+1
τ , t ∈ (kτ, (k + 1)τ ].

x̃ τ is continuous and (x̃ τ )0 = v τ ;


x τ is not continuous, but v τ (t) ∈ −∂F (x τ (t)).

Theorem
If F (x0 ) < +∞ and inf F > −∞, then up to a subsequence τj → 0, both
x̃ τj and x τj converge uniformly to a same curve x ∈ H 1 (Rn ) and v τj
weakly converges in L2 (R; Rn ) to a vector function v s.t. x 0 = v and
1) v (t) ∈ ∂F (x(t)) a.e., if F is λ-convex;
2) v (t) = −∇F (x(t)), ∀t, if F is C 1 .

Chang Liu (THU) Gradient Flow April 24, 2017 15 / 91


Gradient flow in the Euclidean space Approximating Curves

Theorem
If F (x0 ) < +∞ and inf F > −∞, then up to a subsequence τj → 0, both
x̃ τj and x τj converge uniformly to a same curve x ∈ H 1 (Rn ) and v τj
weakly converges in L2 (R; Rn ) to a vector function v s.t. x 0 = v and
1) v (t) ∈ ∂F (x(t)) a.e., if F is λ-convex;
2) v (t) = −∇F (x(t)), ∀t, if F is C 1 .

Details:
1 Lp space
For a measure space (S, Σ, R µ), first define
L(S; Rn ) , {f : S → Rn | S |f |p dµ < ∞}. It is a linear space.
Define Lp (S; Rn ) , L(S; Rn )/{f |f = 0 µ-a.e.} to be a quotient space
(i.e. treat all functions that are equal µ-a.e. as one same element in Lp ).
1/p
Define kf kp , S |f |p dµ
R
, then for 1 ≤ p ≤ ∞ it is a Banach space.
Only L2 (S; Rn ) Rcan be a Hilbert space, with inner product
hf , g iL2 (S;Rn ) , S fg dµ.
Lp (S) , Lp (S; R).

Chang Liu (THU) Gradient Flow April 24, 2017 16 / 91


Gradient flow in the Euclidean space Approximating Curves

Theorem
If F (x0 ) < +∞ and inf F > −∞, then up to a subsequence τj → 0, both
x̃ τj and x τj converge uniformly to a same curve x ∈ H 1 (Rn ) and v τj
weakly converges in L2 (R; Rn ) to a vector function v s.t. x 0 = v and
1) v (t) ∈ ∂F (x(t)) a.e., if F is λ-convex;
2) v (t) = −∇F (x(t)), ∀t, if F is C 1 .

Details:
2 Weak convergence in a Hilbert space H:
xn ∈ H, n ≥ 1, x ∈ H, xn * x is defined as:
∀f ∈ H0 , f (xn ) → f (x).
⇐⇒
∀y ∈ H, hxn , y iH → hx, y iH .
xn → x =⇒ xn * x.
xn * x, kxn k → kxk =⇒ xn → x.
If dim(H) ≤ ∞, xn * x ⇐⇒ xn → x.

Chang Liu (THU) Gradient Flow April 24, 2017 17 / 91


Gradient flow in the Euclidean space Approximating Curves

Theorem
If F (x0 ) < +∞ and inf F > −∞, then up to a subsequence τj → 0, both
x̃ τj and x τj converge uniformly to a same curve x ∈ H 1 (Rn ) and v τj
weakly converges in L2 (R; Rn ) to a vector function v s.t. x 0 = v and
1) v (t) ∈ ∂F (x(t)) a.e., if F is λ-convex;
2) v (t) = −∇F (x(t)), ∀t, if F is C 1 .

Details:
3 H k (Ω) space (Ω ⊂ Rn )
Weak derivative. For u ∈ C k (Ω) and φ ∈ Cc∞ (Ω) (·c for compact
support),
Z Z
uD α φdx = (−1)|α| φD α udx, (Integral by parts)
Ω Ω
Pn
where D α = ∂xα11 · · · ∂xαnn , and |α| = i=1 αi is fixed as k. So define the
weak α-th partial derivative of u as v :
Z Z
uD α φdx = (−1)|α| φv dx, ∀φ ∈ Cc∞ (Ω).
Ω Ω
If it exists, it is uniquely defined a.e.
Chang Liu (THU) Gradient Flow April 24, 2017 18 / 91
Gradient flow in the Euclidean space Approximating Curves

3 H k (Ω) space (Ω ⊂ Rn )
Weak derivative. For u ∈ C k (Ω) and φ ∈ Cc∞ (Ω) (·c for compact
support),
Z Z
uD α φdx = (−1)|α| φD α udx, (Integral by parts)
Ω Ω
Pn
where D α = ∂xα11 · · · ∂xαnn , and |α| = i=1 αi is fixed as k. So define the
weak α-th partial derivative of u as v :
Z Z
|α|
α
uD φdx = (−1) φv dx, ∀φ ∈ Cc∞ (Ω).
Ω Ω
If it exists, it is uniquely defined a.e.
Sobolev space W k,p (Ω) for k ∈ N and p ∈ [1, ∞]:
W k,p (Ω) = {u ∈ Lp (Ω) : D α u ∈ Lp (Ω), ∀|α| ≤ k},
with norm:
( P 1/p
kukW k,p (Ω) = kD α ukpLp (Ω)
|α|≤k , 1 ≤ p < +∞, .
max|α|≤k kD α ukL∞ (Ω) , p = +∞.
W k,p (Ω) is a Banach space.
H k (Ω) , W k,2 (Ω). They are Hilbert spaces.
Chang Liu (THU) Gradient Flow April 24, 2017 19 / 91
Gradient flow in the Euclidean space Approximating Curves

Theorem
If F (x0 ) < +∞ and inf F > −∞, then up to a subsequence τj → 0, both
x̃ τj and x τj converge uniformly to a same curve x ∈ H 1 (Rn ) and v τj
weakly converges in L2 (R; Rn ) to a vector function v s.t. x 0 = v and
1) v (t) ∈ ∂F (x(t)) a.e., if F is λ-convex;
2) v (t) = −∇F (x(t)), ∀t, if F is C 1 .

Details:
4 Up to a subsequence
There exists a sequence τj → 0 s.t. x̃ τj and x τj uniformly converge and
v τj weakly converge.

Chang Liu (THU) Gradient Flow April 24, 2017 20 / 91


Gradient flow in the Euclidean space Approximating Curves

Theorem
If F (x0 ) < +∞ and inf F > −∞, then up to a subsequence τj → 0, both
x̃ τj and x τj converge uniformly to a same curve x ∈ H 1 (Rn ) and v τj
weakly converges in L2 (R; Rn ) to a vector function v s.t. x 0 = v and
1) v (t) ∈ ∂F (x(t)) a.e., if F is λ-convex;
2) v (t) = −∇F (x(t)), ∀t, if F is C 1 .

Proof sketch:
τ τ
|xk+1 −xk |2
2τ ≤ F (xkτ ) − F (xk+1
τ )
P` |xk+1 τ −x τ |2
≤ F (x0τ ) − F (x`+1
τ )

=⇒ k=0 2τ
k
≤ C for F (x0 ) < +∞ and
inf FR> −∞
T
=⇒ 0 12 |(x̃ τ )0 (t)|2 dt ≤ C
=⇒ x̃ τ is bounded in H 1 and v τ in L2 , and the injection H 1 ⊂ C 0,1/2 gives
an equicontinuity bound on x̃ τ of the form |x̃ τ (t) − x̃ τ (s)| ≤ C |t − s|1/2
=⇒ According to the AA theorem, x τ has a uniformly converging
subsequence.

Chang Liu (THU) Gradient Flow April 24, 2017 21 / 91


Gradient flow in the Euclidean space Characterizing Properties

Gradient Flow in the Euclidean Space

Characterizing Properties

Chang Liu (THU) Gradient Flow April 24, 2017 22 / 91


Gradient flow in the Euclidean space Characterizing Properties

Motivation
x 0 = −∇F (x) (or x 0 ∈ −∂F (x)) is hard to generalize to metric space!
There is nothing but distance in metric space, so ∇F (x) or ∂F (x)
cannot be defined! (different from manifold)
Use two properties of gradient flow that can characterize it and can
be generalized to metric space.

Chang Liu (THU) Gradient Flow April 24, 2017 23 / 91


Gradient flow in the Euclidean space Characterizing Properties

Two charactering properties of gradient flow in Rd :


Energy Dissipation Equality (EDE) for F ∈ C 1 (Ω), Ω ⊂ Rn :
Z t 
1 0 2 1 2
F (x(s))−F (x(t)) = |x (r )| + |∇F (x(r ))| dr , ∀0 ≤ s < t ≤ 1
s 2 2

is equivalent to x 0 = −∇F (x). Note it is equivalent even for “≥” (i.e.


“≥” ⇐⇒ “=”).
Evolution Variational Inequality (EVI) for λ-convex function F :

d 1 λ
|x(t) − y |2 ≤ F (y ) − F (x(t)) − |x(t) − y |2 , ∀y ∈ X
dt 2 2
is equivalent to x 0 (t) ∈ −∂F (x(t)).
Sometimes also denoted as EVIλ .
It is important for establishing the uniqueness and stability of gradient
flow.

Chang Liu (THU) Gradient Flow April 24, 2017 24 / 91


Gradient Flow in Metric Spaces

Gradient Flow in Metric Spaces

Chang Liu (THU) Gradient Flow April 24, 2017 25 / 91


Gradient Flow in Metric Spaces Generalization of Basic Concepts

Gradient Flow in Metric Spaces

Generalization of Basic Concepts

Chang Liu (THU) Gradient Flow April 24, 2017 26 / 91


Gradient Flow in Metric Spaces Generalization of Basic Concepts

For metric space (X , d),


Definition
Metric derivative of a curve ω : [0, 1] → X

d(ω(t + h), ω(t))


|ω 0 |(t) = lim ,
h→0 |h|

if the limit exists.

If ω is Lipschitz, |ω 0 |(t) exists for a.e. t ∈ [0, 1].


Rt
d(ω(t0 ), ω(t1 )) ≤ t01 |ω 0 |(s)ds.

Chang Liu (THU) Gradient Flow April 24, 2017 27 / 91


Gradient Flow in Metric Spaces Generalization of Basic Concepts

For metric space (X , d),


In (X , d), ω 0 cannot be defined, but |ω 0 | can.
Definition
ω : [0, 1] → X is absolutely continuous if ∃g ∈ L1 ([0, 1]) s.t.
Z t1
d(ω(t0 ), ω(t1 )) ≤ g (s)ds, ∀t0 < t1 .
t0

Let AC(X ) be the set of such curves.

AC ⇒ Lipschitz
AC ⇒ Metric derivative exists a.e.

Chang Liu (THU) Gradient Flow April 24, 2017 28 / 91


Gradient Flow in Metric Spaces Generalization of Basic Concepts

For metric space (X , d),


Definition
Length of the curve ω : [0, 1] → X :
(n−1 )
X
Length(ω) , sup d(ω(tk ), ω(tk+1 )) : n ≥ 1, 0 = t0 < · · · < tn = 1 .
k=0

R1
If ω ∈ AC(X ), Length(ω) = 0 |ω 0 |(t)dt.

Chang Liu (THU) Gradient Flow April 24, 2017 29 / 91


Gradient Flow in Metric Spaces Generalization of Basic Concepts

For metric space (X , d),


Definition
Geodesic between x0 and x1 in X : a curve ω s.t. ω(0) = x0 , ω(1) = x1 ,
and Length(ω) = minω̃ {Length(ω̃) : ω̃(0) = x0 , ω̃(1) = x1 }.

This is the generalization of straight lines in Rn , and is used to extend


convexity.
Definition
Length space: metric space (X , d) s.t.
∀x, y ∈ X , d(x, y ) = inf ω∈AC(X ) {Length(ω) : ω(0) = x, ω(1) = y }.
Geodesic space: length space and geodesic exists for any pair of
points.

Riemann manifolds are geodesic spaces.

Chang Liu (THU) Gradient Flow April 24, 2017 30 / 91


Gradient Flow in Metric Spaces Generalization of Basic Concepts

For geodesic space (X , d),


Definition
Geodesic convexity: in a geodesic metric space, a function F : X → R
that is convex along geodesics:

F (x(t)) ≤ (1 − t)F (x(0)) + tF (x(1)),

where x(t) is a geodesic joining x(0) and x(1).


λ-geodesic convexity in a geodesic metric space, a function
F : X → R that is λ-convex along geodesics:

t(1 − t) 2
F (x(t)) ≤ (1 − t)F (x(0)) + tF (x(1)) − λ d (x(0), x(1)).
2

Chang Liu (THU) Gradient Flow April 24, 2017 31 / 91


Gradient Flow in Metric Spaces Generalization of Basic Concepts

For metric space (X , d),


Definition
g : X → R is an upper gradient of F : X → R: for every Lipschitz
curve x, Z 1
|F (x(0)) − F (x(1))| ≤ g (x(t))|x 0 |(t)dt.
0
Local Lipschitz constant of F :

|F (x) − F (y )|
|∇F |(x) = lim sup .
y →x d(x, y )

Descending slope (or just slope) of F :

[F (x) − F (y )]+
|∇− F |(x) = lim sup .
y →x d(x, y )

If F is Lipschitz, |∇F | is an upper gradient.


Chang Liu (THU) Gradient Flow April 24, 2017 32 / 91
Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces

Gradient Flow in Metric Spaces

Generalization of Gradient Flow to Metric Spaces

Chang Liu (THU) Gradient Flow April 24, 2017 33 / 91


Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces

Three ways to generalize gradient flow to metric space: EDE-GF, EVI-GF,


MMS-GF.
Definition (EDE-GF)
Let (X , d) be a metric space, F : X → R and g : X → R is an upper
gradient of F . EDE-GF is a curve x : [0, 1] → X with metric derivative a.e.
such that:
Z t 
1 0 2 1 2
F (x(s)) − F (x(t)) = |x (r )| + g (x(r )) dr , ∀0 ≤ s < t ≤ 1.
s 2 2

Existence is easy to guarantee.


Not enough to guarantee uniqueness.

Chang Liu (THU) Gradient Flow April 24, 2017 34 / 91


Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces

Three ways to generalize gradient flow to metric space: EDE-GF, EVI-GF,


MMS-GF.
Definition (EVI-GF)
Let (X , d) be a geodesic space, F : X → R is λ-geodesically convex.
EVI-GF is a curve x : [0, 1] → X such that:

d 1 λ
d(x(t), y )2 ≤ F (y ) − F (x(t)) − d(x(t), y )2 , ∀y ∈ X .
dt 2 2

EVI-GF ⇒ EDE-GF
Uniqueness and contractivity: for two EVI-GFs x(t) and y (s),
d 1 λ
d(x(t), y (s))2 ≤ F (y (s)) − F (x(t)) − d(x(t)), y (s))2 ,
dt 2 2
d 1 λ
d(x(t), y (s))2 ≤ −F (y (s)) + F (x(t)) − d(x(t)), y (s))2 .
ds 2 2
1 2 d
Define E (t) = 2 d(x(t), y (t)) , then dt E (t) ≤ −2λE (t)
⇒ d(x(t), y (t)) ≤ e −λt d(x(0), y (0)), which gives uniqueness for a
given initial condition and exponential convergence for λ > 0.
Chang Liu (THU) Gradient Flow April 24, 2017 35 / 91
Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces

Three ways to generalize gradient flow to metric space: EDE-GF, EVI-GF,


MMS-GF.
Definition (EVI-GF)
Let (X , d) be a geodesic space, F : X → R is λ-geodesically convex.
EVI-GF is a curve x : [0, 1] → X such that:

d 1 λ
d(x(t), y )2 ≤ F (y ) − F (x(t)) − d(x(t), y )2 , ∀y ∈ X .
dt 2 2
A strong condition; existence is hard to guarantee.
A sufficient condition for the existence: Compatible Convexity along
Generalized Geodesics (C2 G2 ):
∀x0 , x1 ∈ X , ∀y ∈ X , ∃x : [0, 1] → X s.t. x(0) = x0 , x(1) = x1 and
t(1 − t) 2
F (x(t)) ≤ (1 − t)F (x0 ) + tF (x1 ) − λ d (x0 , x1 ),
2
d 2 (x(t), y ) ≤ (1 − t)d 2 (x0 , y ) + td 2 (x1 , y ) − t(1 − t)d 2 (x0 , x1 ),
i.e. λ-convexity of F and 2-convexity of x 7→ d 2 (x, y ) along a same
curve (not necessarily geodesic).
Chang Liu (THU) Gradient Flow April 24, 2017 36 / 91
Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces

Three ways to generalize gradient flow to metric space: EDE-GF, EVI-GF,


MMS-GF.
Definition (Generalized MMS)
Generalization of Minimizing Movement Scheme in a metric space (X , d):
for Lipschitz F : X → R ∪ {+∞}, define

τ d(x, xkτ )2
xk+1 ∈ arg min F (x) + .
x 2τ
Define two kinds of interpolations in a similar way:
1) Define x τ (t) = xkτ , t ∈ (kτ, (k + 1)τ ];
2) Define x̃ τ (t), t ∈ (kτ, (k + 1)τ ] to be the constant-speed geodesic
between xkτ and xk+1 τ .

Chang Liu (THU) Gradient Flow April 24, 2017 37 / 91


Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces

Three ways to generalize gradient flow to metric space: EDE-GF, EVI-GF,


MMS-GF.
Define two kinds of interpolations in a similar way:
1) Define x τ (t) = xkτ , t ∈ (kτ, (k + 1)τ ];
2) Define x̃ τ (t), t ∈ (kτ, (k + 1)τ ] to be the constant-speed geodesic
between xkτ and xk+1 τ . (So we require X to be a length space?)

Definition
Constant-speed geodesic: in a length space, a curve ω : [t0 , t1 ] → X s.t.

|t − s|
d(ω(t), ω(s)) = d(ω(t0 ), ω(t1 )), ∀t, s ∈ [t0 , t1 ].
t1 − t0

Constant-speed
R t1geodesics are geodesics:
d(ω(t0 ),ω(t1 ))
Length(ω) = t0 t1 −t0 dt = d(ω(t0 ), ω(t1 )).
The followings are equivalent:
1 ω : [t0 , t1 ] → X is a constant-speed geodesic joining x0 and x1 ;
2 ω ∈ AC(X ) and |ω 0 |(t) = d(ω(tt10−t
),ω(t1 ))
0
a.e.;
R t1 0 p
3 ω ∈ arg min{ t0 |ω |(t) dt : ω(t0 ) = x0 , ω(t1 ) = x1 }, ∀p > 1.
Chang Liu (THU) Gradient Flow April 24, 2017 38 / 91
Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces

Three ways to generalize gradient flow to metric space: EDE-GF, EVI-GF,


MMS-GF.
Define two kinds of interpolations in a similar way:
1) Define x τ (t) = xkτ , t ∈ (kτ, (k + 1)τ ];
2) Define x̃ τ (t), t ∈ (kτ, (k + 1)τ ] to be the constant-speed geodesic
between xkτ and xk+1 τ . (So we require X to be a length space?)

Define v τ . On metric (length) spaces, only its the norm can be


defined: set |v τ | as the piecewise constant speed of x̃ τ ,

|v τ |(t) = d(xk+1
τ
, xkτ )/τ, t ∈ (kτ, (k + 1)τ ].

Definition (MMS-GF)
Let (X , d) be a metric space (not necessarily length space). A curve
x : [0, T ] → X is called Generalized Minimizing Movements (GMM) (I
would call it MMS-GF) if there exists a sequence τj → 0 s.t. x τj uniformly
converges to x in (X , d).

Chang Liu (THU) Gradient Flow April 24, 2017 39 / 91


Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces

Three ways to generalize gradient flow to metric space: EDE-GF, EVI-GF,


MMS-GF.
Definition (MMS-GF)
Let (X , d) be a metric space (not necessarily length space). A curve
x : [0, T ] → X is called (by me) MMS-GF if there exists a sequence
τj → 0 s.t. x τj uniformly converges to x in (X , d).

Existence analysis:
Condition for the existence of xkτ :
The sub-level set {x : F (x) ≤ c} is compact in X , and F is Lipschitz.
(The corresponding topology is either the one induced by d, or a weaker topology s.t. d is
lower semi-continuous w.r.t. it.)
Condition for the existence of limit curves (i.e. MMS-GF):
Existence of xkτ is enough!
τ
d(xk+1 ,xkτ )2 √
Due to 2τ
≤ F (xkτ ) − F (xk+1
τ ), we have d(x τ (t), x τ (s)) ≤ C (|t − s|1/2 + τ ),
i.e. {x τ }τ are equi-Hölder continuous with exponent 1/2 (up to a negligible error of order

τ ). So by AA theorem, the set {x τ }τ has uniformly converging subsequences, i.e.
MMS-GF. But not unique and no relation with F (EDE or EVI) is obtained.
Chang Liu (THU) Gradient Flow April 24, 2017 40 / 91
Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces

Three ways to generalize gradient flow to metric space: EDE-GF, EVI-GF,


MMS-GF.
Definition (MMS-GF)
Let (X , d) be a metric space (not necessarily length space). A curve
x : [0, T ] → X is called (by me) MMS-GF if there exists a sequence
τj → 0 s.t. x τj uniformly converges to x in (X , d).

To relate MMS-GF to F and other generalizations:


If in addition to “{x : F (x) ≤ c} is compact in RX , F is Lipschitz”, F
and |∇− F | are lower-semicontinuous, we have 12 0t |x 0 |(r )2 dr
Rt
+ 21 0
|∇− F (x(r ))|2 dr ≤ F (x(0)) − F (x(t)), ∀0 ≤ t ≤ T . (not EDE)
the slope |∇−
If additionally,
1 t 0
R F |−is an upper
1 t
gradient of F , we have
EDE: 2 s |x |(r ) dr + 2 s |∇ F (x(r ))|2 dr ≤
2
R

F (x(s)) − F (x(t)), ∀0 ≤ s < t ≤ T .


If F is λ-geodesically convex, all the conditions are met.

Chang Liu (THU) Gradient Flow April 24, 2017 41 / 91


Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces

Conclusion for now


Table: Conclusion of extentions of gradient flow to metric space

Extension Requirement Existence Uniqueness and


Contractivity
EVI-GF X geodesic space, Hard. C2 G2 is a suffi- Guaranted
F λ-geod. convex cient condition
EDE-GF X metric space Easy Not guaranteed
MMS-GF X metric space Relatively easy. “{x : Not guaranteed
F (x) ≤ c} compact and
F Lipschitz” or “F λ-
geod. convex” suffices
EVI-GF ⊂ EDE-GF
MMS-GF ⊂ EDE-GF if “{x : F (x) ≤ c} compact, F Lipschitz, F and
|∇− F | lower-semicont., |∇− F | is an upper grad. of F ” or “F λ-geod.
convex”
Chang Liu (THU) Gradient Flow April 24, 2017 42 / 91
Gradient Flows on Wasserstein Spaces

Gradient Flows on Wasserstein Spaces

Chang Liu (THU) Gradient Flow April 24, 2017 43 / 91


Gradient Flows on Wasserstein Spaces Recap. of Optimal Transport Problems

Gradient Flows on Wasserstein Spaces

Recap. of Optimal Transport Problems

Chang Liu (THU) Gradient Flow April 24, 2017 44 / 91


Gradient Flows on Wasserstein Spaces Recap. of Optimal Transport Problems

Recap. of Optimal Transport Problems

Settings
Let X , Y be two measurable spaces, µ ∈ P(X ) and ν ∈ P(Y ) are
fixed measures, Let c : X × Y → R be a cost function.
Definition (push-forward of a measure)
For a measurable function T : X → Y and a measure µ ∈ P(X ), define
the push-forward of µ under T , T# µ, to be a measure on Y s.t.

T# µ(A) = µ(T −1 (A)), ∀A ∈ σ-algebra of Y .

Example
For X = Y = Rn and T invertible, then in terms of p.d.f.,
T# µ = (µ ◦ T −1 )|det(∇T −1 )|, i.e. rule of change of variables.

Chang Liu (THU) Gradient Flow April 24, 2017 45 / 91


Gradient Flows on Wasserstein Spaces Recap. of Optimal Transport Problems

Recap. of Optimal Transport Problems


Monge’s Problem:
Z
(MP) inf c(x, T (x))dµ(x).
T# µ=ν X

(Optimal) T is called a (optimal) transport map.


The problem may not be feasible.
Kantorovich’s Problem:
Z
(KP) inf c(x, y )dγ(x, y ),
γ∈Π(µ,ν) X ×Y

where Π(µ, ν) , {γ|(πX )# γ = µ, (πY )# γ = ν}.


(Optimal) γ is called a (optimal) transport plan.
The problem is always feasible.
MP is a special case of KP, where γ is restricted to the form
γ = (id × T )# µ. If T ∗ exists, γ ∗ = (id × T ∗ )# µ is also optimal.
Chang Liu (THU) Gradient Flow April 24, 2017 46 / 91
Gradient Flows on Wasserstein Spaces Recap. of Optimal Transport Problems

Recap. of Optimal Transport Problems

Dual Kantorovich Problem:


Direct form:
Z Z
(DKP) sup φdµ + ψdν.
φ∈L1 (X ),ψ∈L1 (Y ), X Y
φ(x)+ψ(y )≤c(x,y )

Reformulation:
Definition
c-transform (c-conjugate) of χ : X → R̄, χc : Y → R̄, is defined as
χc (y ) , inf x∈X c(x, y ) − χ(x).
Ψc (X ) , {χc |χ : X → R̄}. ψ : Y → R̄ is c-concave if ψ ∈ Ψc (X ).

Z Z
0
(DKP ) sup φdµ + φc dν.
φ∈Ψc (X ) X Y

Chang Liu (THU) Gradient Flow April 24, 2017 47 / 91


Gradient Flows on Wasserstein Spaces Recap. of Optimal Transport Problems

Recap. of Optimal Transport Problems

Dual Kantorovich Problem:


Reformulation:
Z Z
(DKP 0 ) sup φdµ + φc dν.
φ∈Ψc (X ) X Y

Definition (Kantorovich potential)


The optimal φ of (DKP00 ) is called Kantorovich potential, denoted by ϕ.

When c is uniformly continuous (e.g. when c is continuous and X is


compact), then the existence of Kantorovich potential ϕ can be proven
(by AA theorem).

Remark
Strong duality holds: KP(µ, ν) = DKP(µ, ν).

Chang Liu (THU) Gradient Flow April 24, 2017 48 / 91


Gradient Flows on Wasserstein Spaces Recap. of Optimal Transport Problems

Recap. of Optimal Transport Problems

Dual Kantorovich Problem:


Special case 1: X = Y , c(x, y ) = d(x, y ) is a distance:
Z Z
(DKP1) sup φdµ − φdν.
φ∈Lip1 X X

Special case 2: X = Y = Ω ⊂ Rn and c(x, y ) = 12 |x − y |2 :

Theorem
For quadratic cost and Ω ⊂ Rn close, bounded and connected, ∃! optimal
transport plan γ ∗ for (KP).
Additionally, if µ is absolutely continuous, optimal transport map T ∗ exists and
γ ∗ = (id, T ∗ )# µ. Moreover, there exists a Kantorovich potential ϕ s.t. ∇ϕ is
2
unique µ-a.e, and T = ∇u a.e., where u(x) , x2 − φ(x) is convex.

Chang Liu (THU) Gradient Flow April 24, 2017 49 / 91


Gradient Flows on Wasserstein Spaces Recap. of Optimal Transport Problems

Recap. of Optimal Transport Problems

Dual Kantorovich Problem:


Special case 2: X = Y = Ω ⊂ Rn and c(x, y ) = 12 |x − y |2 :

Theorem
For quadratic cost and Ω ⊂ Rn close, bounded and connected, ∃! optimal
transport plan γ ∗ for (KP).
Additionally, if µ is absolutely continuous, optimal transport map T ∗ exists and
γ ∗ = (id, T ∗ )# µ. Moreover, there exists a Kantorovich potential ϕ s.t. ∇ϕ is
2
unique µ-a.e, and T = ∇u a.e. where u(x) , x2 − φ(x) is convex.

Corollary
Under the same condition, any gradient of a convex function is an optimal map
between µ and its image measure.
Optimal transport map uniquely exists for c(x, y ) = h(x − y ) with h strictly
convex. (e.g. |x − y |p , p > 1).

Chang Liu (THU) Gradient Flow April 24, 2017 50 / 91


Gradient Flows on Wasserstein Spaces The Wasserstein Space

Gradient Flows on Wasserstein Spaces

The Wasserstein Space

Chang Liu (THU) Gradient Flow April 24, 2017 51 / 91


Gradient Flows on Wasserstein Spaces The Wasserstein Space

The Wasserstein Space

Definition
On metricR space (X , d), for p ≥ 1 and a fixed point x0 ∈ X , define
mp (µ) , X d(x, x0 )p dµ(x), and Pp (X ) , {µ ∈ P(X ) : mp (µ) < +∞},
which is independent of the choice of x0 .

Theorem
1/p
d(x, y )p dγ(x, y )
R
Wp (µ, ν) , inf γ∈Π(µ,ν) X is a distance on Pp (X )

Definition (Wasserstein space)


Wp (X ) , (Pp (X ), Wp ).

Chang Liu (THU) Gradient Flow April 24, 2017 52 / 91


Gradient Flows on Wasserstein Spaces The Wasserstein Space

The Wasserstein Space

Definition (Wasserstein space)


Wp (X ) , (Pp (X ), Wp ).

Theorem
In Wp (X ) with p ≥ 1, given µ, µn ∈ Pp (X ), n ∈ N, the followings are
equivalent:
µn → µ w.r.t. Wp ;
µn * µ and mp (µn ) → mp (µ);
R R
X φdµn0 → X φdµ, ∀φ ∈
φ ∈ C (X ) : ∃A, B ∈ R s.t. |φ(x)| ≤ A + Bd(x, x0 )p , ∀x, x0 ∈ X .

Chang Liu (THU) Gradient Flow April 24, 2017 53 / 91


Gradient Flows on Wasserstein Spaces The Wasserstein Space

The Wasserstein Space


Special cases:
Case 1: (X , d) is compact.
P(X ) = Pp (X ), ∀p ≥ 1.
µn → µ w.r.t. Wp ⇐⇒ µn * µ.
Case 2: X = Ω ⊂ Rd and p ∈ [1, +∞). c(x, y ) = kx − y kp .

Lp distance between p.d.f.s of two measures: “vertical” distance. Wp


distance between two measures: “horizontal” distance.
p1 ≤ p2 =⇒ Wp1 ≤ Wp2 . If Ω is bounded, Wp1 ≤ Wp2 =⇒ p1 ≤ p2 .
Chang Liu (THU) Gradient Flow April 24, 2017 54 / 91
Gradient Flows on Wasserstein Spaces The Wasserstein Space

The Wasserstein Space


Geodesic on Wp (Ω):

Theorem (McCann’s displacement interpolation)


If Ω ∈ Rd is convex, then Wp (Ω) is a length space, and for
µ, ν ∈ Wp (Ω) and γ as optimal transport plan from µ to ν, then
µγ (t) , (πt )# γ, where πt (x, y ) , (1 − t)x + ty ,
is a constant-speed geodesic.
If p > 1, then all the constant-speed geodesics are of this form.
If additionally µ is absolutely continuous, then there is only one
geodesic, whose form is

µt = (Tt )# µ, where Tt , (1 − t)id + tT ,

where T is the optimal transport map from µ to ν.


Chang Liu (THU) Gradient Flow April 24, 2017 55 / 91
Gradient Flows on Wasserstein Spaces The Wasserstein Space

The Wasserstein Space


Geodesic convexity in W2 (Ω) (displacement convexity):
Definition is given by the general gradient flow theory.
Important examples:
Definition (Important functionals on W2 (Ω))
For f : R → R convex, V : Ω → R, W : Rd → R symmetric
(W (x) = W (−x)), define
Z Z ZZ
1
F(ρ) = f (ρ(x))dx, V(ρ) = V (x)dρ, W = W (x−y )dρ(x)dρ(y ).
2

Theorem
λ-convexity on Ω of V (or W ) =⇒ λ-geodesic convexity on W2 (Ω) of V (or
W).
f (0) = 0 and s d f (s −d ) is convex and decreasing, Ω is convex, 1 < p < ∞
=⇒ F is geodesically convex in W2 (Ω).
Chang Liu (THU) q
Gradient Flow April 24, 2017 56 / 91
Gradient Flows on Wasserstein Spaces Gradient Flows on W2 (Ω), Ω ⊂ Rn

Gradient Flows on Wasserstein Spaces

Gradient Flows on W2 (Ω), Ω ⊂ Rn

Chang Liu (THU) Gradient Flow April 24, 2017 57 / 91


Gradient Flows on Wasserstein Spaces Gradient Flows on W2 (Ω), Ω ⊂ Rn

Curves/flows on Wp (Ω), Ω ⊂ Rn

Continuity equation:
What is special for Wp (Ω), is that it is of probability distributions. The
curve/flow/dynamics in Wp (Ω), µt , represents the evolution of
distributions. This evolution can be associated with (viewed as a result of)
an evolution/dynamics in Rn , represented by vector field vt . The typical
relation between them is the continuity equation:

∂t µt + ∇ · (vt µt ) = 0.

Chang Liu (THU) Gradient Flow April 24, 2017 58 / 91


Gradient Flows on Wasserstein Spaces Gradient Flows on W2 (Ω), Ω ⊂ Rn

Curves/flows on Wp (Ω), Ω ⊂ Rn
Theorem
Let p > 1, Ω ⊂ Rd open, bounded and connected.
Let {µt }t∈[0,1] be an AC curve in Wp (Ω). Then for a.e. t ∈ [0, 1]
there exists a vector field vt ∈ Lp (µt ; Rd ) s.t. 1)
∂t µt + ∇ · (vt µt ) = 0 is satisfied in the sense of distributions; 2) for
a.e. t ∈ [0, 1], kvt kLp (µt ) ≤ |µ0 |(t).
Conversely, if {µt }t∈[0,1] ⊂ Pp (Ω) and ∀t we have a vector field
R1
vt ∈ Lp (µt ; Rd ) with 0 kvt kLp (µt ) dt < +∞ solving
∂t µt + ∇ · (vt µt ) = 0, then {µt }t∈[0,1] is AC in W(Ω) and for a.e.
t ∈ [0, 1], |µ0 |(t) ≤ kvt kLp (µt ) .
Thus in both cases, the conclusion can be strengthened with
|µ0 |(t) = kvt kLp (µt ) .
(I guess vti : Ω → R, 1 ≤ i ≤ d satisfies |vti |p is µt -integrable, and
P R 1/p
d i (x)|p dµ (x)
kvt kLp (µt ) = i=1 Ω |v t t .)
Chang Liu (THU) Gradient Flow April 24, 2017 59 / 91
Gradient Flows on Wasserstein Spaces Gradient Flows on W2 (Ω), Ω ⊂ Rn

Gradient Flows on W2 (Ω), Ω ⊂ Rn


We only consider absolutely continuous measures, denoted by ρ, so
that distribution density can be accessed.
Let F : W2 (Ω) → R̄ be a functional on Ww (Ω). Use MMS-GF to
define the gradient flow w.r.t. F :
W22 (ρ, ρτk )
ρτk+1 ∈ arg min F (ρ) +
ρ 2τ
General existence conditions apply, e.g. {ρ : F (ρ) ≤ c} compact and
F Lipschitz, or F λ-geodesically convex.
Special result:
Theorem
Let F : W2 (Ω) → R̄ be λ-geodesically convex, then MMS-GF w.r.t. F
exists. Let ρ0t , ρ1t be two solutions, and define E (t) , 12 W22 (ρ0t , ρ1t ). Then
E (t) ≤ e −λt E (0), which implies uniqueness for a given initial condition,
and stability and exponential convergence for λ > 0.
Chang Liu (THU) Gradient Flow April 24, 2017 60 / 91
Gradient Flows on Wasserstein Spaces Gradient Flows on W2 (Ω), Ω ⊂ Rn

Gradient Flows on W2 (Ω), Ω ⊂ Rn

To relate F and the vector field vt , we need the notion of first


variation.
Definition (First Variation)
First variation of a functional G : P(Ω) → R) is defined as δG
δρ (ρ) : Ω → R
d
R δG
s.t. dε G (ρ + εχ)|ε=0 = δρ (ρ)(x)dχ(x), ∀χ ∈ {χ : ∃ε0 s.t. ∀ε ∈
[0, ε0 ], ρ + εχ ∈ P(Ω)}.
d
(Recall that on Rd , ∇F ∈ Rd s.t. dε F (x + εv )|ε=0 = (∇F , v ), ∀v ∈ Rd .)

Chang Liu (THU) Gradient Flow April 24, 2017 61 / 91


Gradient Flows on Wasserstein Spaces Gradient Flows on W2 (Ω), Ω ⊂ Rn

Gradient Flows on W2 (Ω), Ω ⊂ Rn

To relate F and the vector field vt , we need the notion of first


variation.
Definition (Important functionals on W2 (Ω))
For f : R → R convex, V : Ω → R, W : Rd → R symmetric
(W (x) = W (−x)), define
Z Z ZZ
1
F(ρ) = f (ρ(x))dx, V(ρ) = V (x)dρ, W = W (x−y )dρ(x)dρ(y ).
2

Theorem
δF
δρ = f 0 (ρ), δV δW
δρ = V , δρ = W ∗ ρ (convolution)

Chang Liu (THU) Gradient Flow April 24, 2017 62 / 91


Gradient Flows on Wasserstein Spaces Gradient Flows on W2 (Ω), Ω ⊂ Rn

Gradient Flows on W2 (Ω), Ω ⊂ Rn

To relate F and the vector field vt , we need the notion of first


variation.
Theorem
The first variation of Wasserstein distance with cost function c:
δWc (ρ,ν)
δρ = ϕ, if ρ, ν are defined on Ω ⊂ Rd , c : Ω × Ω → R continuous,
and Kantorovich potential ϕ is unique and c-concave.

Chang Liu (THU) Gradient Flow April 24, 2017 63 / 91


Gradient Flows on Wasserstein Spaces Gradient Flows on W2 (Ω), Ω ⊂ Rn

Gradient Flows on W2 (Ω), Ω ⊂ Rn


Relate F and the vector field vt .
Theorem
W22 (ρ,ρτk )
For the Minimizing Movement Scheme ρτk+1 ∈ arg minρ F (ρ) + 2τ ,
the optimality condition is:
δF τ ϕ
(ρk+1 ) + = const.
δρ τ
where ϕ is the Kantorovich potential from ρτk+1 to ρτk .
Relation between T ∗ and ϕ: T ∗ (x) = x − ∇ϕ,
relation between vt and T : vt (x) = (x − T (x))/τ ,
so in the limit τ → 0, the gradient flow w.r.t. F induces a flow in Rn :
δF
vt (x) = −∇( (ρt ))(x),
δρ
and the flow ρt in W2 (Ω) is:  
δF
∂t ρt − ∇ · ρt ∇ (ρt ) = 0.
δρ
Chang Liu (THU) Gradient Flow April 24, 2017 64 / 91
Gradient Flows on Wasserstein Spaces Numerical methods from the JKO scheme

Gradient Flows on Wasserstein Spaces

Numerical methods from the JKO scheme

Chang Liu (THU) Gradient Flow April 24, 2017 65 / 91


Gradient Flows on Wasserstein Spaces Numerical methods from the JKO scheme

Numerical methods from the JKO scheme

JKO: Jordan-Kinderleherer-Otto.
Solve the problem of the form min{F (ρ) + 12 W22 (ρ, ν) : ρ ∈ P(Ω)} (τ
is included in F .)
Two recent methods:
1) based on the Benamou-Brenier formula, for convex F (ρ);
2) based on methods from semi-discrete optimal transport, for
geodesically convex F . (involving techniques in computing geometry;
not covered in this slide)

Chang Liu (THU) Gradient Flow April 24, 2017 66 / 91


Gradient Flows on Wasserstein Spaces Numerical methods from the JKO scheme

Benamou-Brenier formula

Theorem (McCann’s displacement interpolation)


If Ω ∈ Rd is convex, for µ, ν ∈ Wp (Ω) and γ as optimal transport
plan from µ to ν, then
µγ (t) , (πt )# γ, where πt (x, y ) , (1 − t)x + ty ,
is a constant-speed geodesic.
If p > 1, then all the constant-speed geodesics are of this form.
If additionally µ is absolutely continuous, then there is only one
geodesic, whose form is

µt = (Tt )# µ, where Tt , (1 − t)id + tT ,

where T is the optimal transport map from µ to ν.

Chang Liu (THU) Gradient Flow April 24, 2017 67 / 91


Gradient Flows on Wasserstein Spaces Numerical methods from the JKO scheme

Benamou-Brenier formula
From this theorem, we can see:
For the cost c(x, y ) = |x − y |p , find an optimal transport ⇐⇒ find
constant-speed geodesic in Wp , since they are closely related and
(when p > 1 and µ absolutely continuous) they are one-to-one.
R1
Find constant-speed geodesic: minµt 0 |µ0 |(t)p dt.
1/p
In Wp , we have |µ0 |(t) = kvt kLp (µt ) = Ω |vt |p dµt
R
, where vt is
the velocity field solving the continuity equation.
So, we get the Benamou-Brenier formula (Time-dependent Kantorovich
Problem):
Z 1Z
(TKP1) min |vt |p dρt dt.
(ρt ,vt ):ρ0 =µ,ρ1 =ν, 0 Ω
∂t ρt +∇·(vt µt )=0

It is a kinetic energy minimization problem.


It selects constant-speed geodesics connecting µ to ν.
It is non-convex for (ρt , vt ).
Chang Liu (THU) Gradient Flow April 24, 2017 68 / 91
Gradient Flows on Wasserstein Spaces Numerical methods from the JKO scheme

Benamou-Brenier formula

Z 1Z
(TKP1) min |vt |p dρt dt.
(ρt ,vt ):ρ0 =µ,ρ1 =ν, 0 Ω
∂t ρt +∇·(vt µt )=0

Transform it to convex: let Et = vt ρt , and use (ρt , Et ) as arguments:


1Z
|Et |p
Z
(TKP2) min dxdt.
(ρt ,Et ):ρ0 =µ,ρ1 =ν, 0 Ω ρtp−1
∂t ρt +∇·Et =0

Chang Liu (THU) Gradient Flow April 24, 2017 69 / 91


Gradient Flows on Wasserstein Spaces Numerical methods from the JKO scheme

Benamou-Brenier formula
1Z
|Et |p
Z
(TKP2) min dxdt.
(ρt ,Et ):ρ0 =µ,ρ1 =ν, 0 Ω ρtp−1
∂t ρt +∇·Et =0
Further transformation:
Kq , {(a, b) ∈ R × Rd : a + q1 |b|q ≤ 0} for q = p/(p − 1) conjugate
of p. It is convex in R × Rd .
For t ∈ R and x ∈ Rd , define 
|x|p
 p1 t p−1 , if t > 0

fp (t, x) , sup (at + b · x) = 0, if t = 0, x = 0
(a,b)∈Kq 
 +∞, if t = 0, x 6= 0, or t < 0.
So the optimization problem can be reformulated
Z Z as Z Z
(TKP3) min sup adρ + b · dE ,
(ρt ,Et ):ρ0 =µ,ρ1 =ν, (a,b)∈
∂t ρt +∇·Et =0 C (Ω×[0,1];Kq )
RR
where indicates integral w.r.t. both space and time.
Chang Liu (THU) Gradient Flow April 24, 2017 70 / 91
Gradient Flows on Wasserstein Spaces Numerical methods from the JKO scheme

Benamou-Brenier formula
ZZ ZZ
(TKP3) min sup adρ + b · dE ,
(ρt ,Et ):ρ0 =µ,ρ1 =ν, (a,b)∈
∂t ρt +∇·Et =0 C (Ω×[0,1];Kq )
Utilizing
ZZ ZZ Z Z
sup − ∂t φdρ − ∇φ · dE + φ1 dν − φ0 dµ
φ∈C 1 ([0,1]×Ω)

0, if ρ0 = µ, ρ1 = ν, ∂t ρt + ∇ · Et = 0,
= ,
+∞, otherwise.
we get
ZZ ZZ
(TKP4) min sup (a − ∂t φ)dρ + (b − ∇φ) · dE
(ρt ,Et ) (a,b)∈C (Ω×[0,1];Kq ),
φ∈C 1 ([0,1]×Ω)
Z Z
+ φ1 dν − φ0 dµ.

Chang Liu (THU) Gradient Flow April 24, 2017 71 / 91


Gradient Flows on Wasserstein Spaces Numerical methods from the JKO scheme

Benamou-Brenier formula

ZZ ZZ
(TKP4) min sup (a − ∂t φ)dρ + (b − ∇φ) · dE
(ρt ,Et ) (a,b)∈C (Ω×[0,1];Kq ),
φ∈C 1 ([0,1]×Ω)
Z Z
+ φ1 dν − φ0 dµ.
R R
To simplify notation, let m =R (ρ, E ), AR= (a, b), m · A = adρ + b · dE ,
∇t,x φ = (∂t φ, ∇φ), G (φ) = φ1 dν − φ0 dµ, IKp (·) be indicator
function, then

(TKP40 ) min sup L(m, (A, φ)) , m · (A − ∇t,x φ) − IKp (A) + G (φ).
m A,φ

This is a mini-max problem.

Chang Liu (THU) Gradient Flow April 24, 2017 72 / 91


Gradient Flows on Wasserstein Spaces Numerical methods from the JKO scheme

Benamou-Brenier formula

(TKP40 ) min sup L(m, (A, φ)) , m · (A − ∇t,x φ) − IKp (A) + G (φ).
m A,φ

L(m, (A, φ)) is the Lagrangian of the form L(X , Y ) = X · ΛY − H(Y ),


where Λ is a linear operator. Its optimality condition

ΛY = 0
Λ∗ X − ∇H(Y ) = 0
is the same as the one of the augmented Lagrangian
L̃(X , Y ) = X · ΛY − H(Y ) − 2r |ΛY |2 :

ΛY = 0
,
Λ∗ X − ∇H(Y ) − r Λ∗ ΛY = 0
for any r > 0, and Λ∗ is its adjoint w.r.t. the inner product. So finally,
r
(TKP5) min sup m · (A − ∇t,x φ) − IKp (A) + G (φ) − kA − ∇t,x φk2 .
m A,φ 2
Chang Liu (THU) Gradient Flow April 24, 2017 73 / 91
Gradient Flows on Wasserstein Spaces Numerical methods from the JKO scheme

Benamou-Brenier formula

r
(TKP5) min sup m · (A − ∇t,x φ) − IKp (A) + G (φ) − kA − ∇t,x φk2 .
m A,φ 2

To solve this,
Optimize φ: minimize a quadratic functional in calculus of variations,
e.g. solving a Poisson equation
Optimize A: a pointwise minimization problem, specifically a
projection on the convex set Kq
Optimize m: gradient descent. m ← m − r (A − ∇t,x φ)

Chang Liu (THU) Gradient Flow April 24, 2017 74 / 91


Application

Application

Chang Liu (THU) Gradient Flow April 24, 2017 75 / 91


Application

Application

To be continued... :(

Chang Liu (THU) Gradient Flow April 24, 2017 76 / 91


My Remarks

My Remarks

Chang Liu (THU) Gradient Flow April 24, 2017 77 / 91


My Remarks

My Remarks
Given a functional F (ρ) on W2 (Ω) with Ω ⊂ Rn , if we want to minimize
it, we can find a gradient flow on W2 (Ω) defined by F , which gradually
minimizes F , by:
1 the MMS discretization with step size τ : we get {ρτ } , where
k k
W22 (ρ, ρτk )
ρτk+1 ∈ arg min F (ρ) + .
ρ 2τ
In this case we directly get a sequence of distributions DIRECTLY,
e.g. in terms of pdf.
2 simulating a dynamics/flow on Ω, which is associated with the
gradient flow on W2 (Ω) (or which is the cause/reason of the
evolution of the distribution described by the gradient flow on
W2 (Ω)). The dynamics/flow on Ω is governed by
 
d δF
ξt (x) = vt (x), vt (x) = −∇ (ρt ) (x).
dt δρ
In this case the distribution is embodied as samples from it. We will
Chang Liu (THU) Gradient Flow April 24, 2017 78 / 91
My Remarks

My Remarks on SVGD

Afterwards, we will only consider the second approach to get the


gradient flow.
Take F (ρ) = KL(ρ||p), for a fixed distribution p.
Compare the results of gradient flow and variation calculus. (Omit ·t
temporarily)

By Gradient Flow
F (ρ) = Ω ρ log pρ dx, δF
R
δρ = log ρ − log p + 1, so:

v (x) = ∇ log p(x) − ∇ log ρ(x).

Chang Liu (THU) Gradient Flow April 24, 2017 79 / 91


My Remarks

My Remarks on SVGD
By Variation Calculus
Find the “directional derivative” G (v , ρ) of F (ρ) w.r.t. the dynamics
d
dt ξt (x) = vt (x):
d
G (v , ρ) = F (ρ[ξ(ε) ] )|ε=0 , ξ (ε) (x) = x + εv (x),

−1 −1
ρ[ξ(ε) ] (x) = ρ(ξ (ε) (x))|Jacξ (ε) | ≈ ρ(x − εv (x))|Jac(x − εv (x))|.
For F (ρ) = KL(ρ||p), by my written R notes on SVGD or the electronic
notes on R-SVGD, G (v , ρ) = Ω ρ[∇ log p · v + ∇ · v ]dx.

P, ρ):Rmaxv G (v , ρ), s.t. kv k = 1. If


Find v (x) s.t. it maximizes G (v
we take the norm as kv k = 12 ni=1 Ω vi2 (x)ρ(x)dx and introduce
Lagrange multiplier λ, n Z
λX
min max G (v , ρ) + vi2 (x)ρ(x)dx − λ.
λ v 2 Ω i=1
For F (ρ) = KL(ρ||p),
 take the first variation w.r.t. vi , i.e. let
∂L Pn ∂L
∂vi − j=1 ∂j ∂(∂j vi ) = 0:
ρ∂i log p − ∂i ρ + λρv
Chang Liu (THU) Gradient 0, vi ∝ ∂i log p − ∂i log
i =Flow April ρ,
24, 2017 80 / 91
My Remarks

My Remarks on SVGD

By Variation Calculus

P, ρ):Rmaxv G (v , ρ), s.t. kv k = 1. If


Find v (x) s.t. it maximizes G (v
we take the norm as kv k = 12 ni=1 Ω vi2 (x)ρ(x)dx and introduce
Lagrange multiplier λ, n Z
λX
min max G (v , ρ) + vi2 (x)ρ(x)dx − λ.
λ v 2 Ω
i=1
For F (ρ) = KL(ρ||p),
 take the first variation w.r.t. vi , i.e. let
∂L Pn ∂L
∂vi − j=1 ∂j ∂(∂j vi ) = 0:
ρ∂i log p − ∂i ρ + λρvi = 0, vi ∝ ∂i log p − ∂i log ρ,
as the same as the result by gradient flow.

However, in SVGD neither is adopted. It uses v in the space of


vector-valued RKHS and turn the objective as an inner product in it.

Chang Liu (THU) Gradient Flow April 24, 2017 81 / 91


My Remarks

My Remarks on General Results


The general equivalence of Gradient Flow and Variation Calculus?
d  
G (v , ρ) = F ρ(x − εv (x))|Jac(x − εv (x))|

dε Z ε=0
δF  
= lim ρ(x − εv (x))|Jac(x − εv (x))|
ε→0 Ω δρ
h
· − v · ∇ρ(x − εv )|Jac(x − εv )|
i
+ ρ(x − εv )Tr Jac(x + εv )Jac(v ) dx
Z
δF  
= ρ(x) − v · ∇ρ(x) + ρ(x)∇ · v dx.
Ω δρ

But this result cannot even recover the case of F (ρ) = KL(ρ||p)! Nor can
it deduce the result of Gradient Flow v = −∇( δFδρ ) by
minλ maxv G (v , ρ) + λkv k − λ using variation calculus. Why? I would
prefer that there is something wrong in the above deduction of G (v , ρ).
Chang Liu (THU) Gradient Flow April 24, 2017 82 / 91
Appendix

Appendix

Chang Liu (THU) Gradient Flow April 24, 2017 83 / 91


Appendix

Compactness

A topological space X is compact if each of its open covers has a


finite subcover.
If X is additionally a metric space, then “X is compact” is equivalent
to:
X is sequentially compact: every sequence in X has a convergent
subsequence (the limit is in X , of course).
X is complete and totally bounded (∀ε > 0, X is a subset of the union
of FINITE open balls of radius ε).
X is limit point compact: every infinite subset of X has at least one
limit point in X .

Chang Liu (THU) Gradient Flow April 24, 2017 84 / 91


Appendix

Weak convergence of measures

Let X be a measurable space. R R


µn * µ: for any bounded function f : X → R, f µn → f µ.

Chang Liu (THU) Gradient Flow April 24, 2017 85 / 91


Appendix

Lower semicontinuity

On a topological space X , f : X → R ∪ {−∞, ∞} is lower


semicontinuous at x0 ∈ X if ∀ε > 0, ∃U a neighbourhood of x0 s.t.
∀x ∈ U, f (x) ≥ f (x0 ) − ε when f (x0 ) < +∞, and
limx→x0 f (x) = +∞ when f (x0 ) = +∞.
In metric space, this is equivalent to lim inf x→x0 f (x) ≤ f (x0 ).

Chang Liu (THU) Gradient Flow April 24, 2017 86 / 91


Appendix

Original notion of absolute continuity

I = [a, b] is a compact interval of R (when I is not compact AC can also


be defined, in a more general way). A function f : I → R is absolutely
continuous on IR if there exists a Lebesgue integrable function g on I s.t.
x
f (x) = f (a) + a g (t)dt, ∀x ∈ I .

Chang Liu (THU) Gradient Flow April 24, 2017 87 / 91


Appendix

Hölder space

Hölder condition: on Rd , |f (x) − f (y )| ≤ C kx − y kα , with exponent


α.
Hölder space C k,α (Ω): functions on Ω with continuous derivatives up
to order k and kth partial derivatives are Hölder continuous with
exponent 0 < α ≤ 1.
The larger α > 0 the stronger condition. So weaker than Lipschitz
(α = 1).
Compact inclusion C 0,β (Ω) → C 0,α (Ω), for 0 < α < β ≤ 1.

Chang Liu (THU) Gradient Flow April 24, 2017 88 / 91


Appendix

Equicontinuity

Let X and Y be two metric spaces, and F a family of functions from X to


Y . The family F is equicontinuous at a point x0 ∈ X if ∀ε > 0, ∃δ > 0
s.t. d(f (x0), f (x)) < ε, ∀f ∈ F , ∀x : d(x0 , x) < δ.

Concept δ depends on
Continuity ε, x0 , f
Uniform continuity ε, f
Pointwise equicontinuity ε, x0
Uniform equicontinuity ε

Chang Liu (THU) Gradient Flow April 24, 2017 89 / 91


Appendix

Ascoli-Arzelà’s theorem (AA theorem)


X : a compact Hausdorff space. C (X ): the space of continuous functions
on X .
Typical statement: for a sequence of real-valued continuous functions
{fn }n on a closed and bounded interval [a, b], 1) ∃ uniformly
converging subsequence {fnk }k ⇒ {fn }n is uniformly bounded and
equicontinuous; 2) every subsequence {fnk }k has a uniformly
convergent subsequence ⇒ {fn }n is uniformly bounded and
equicontinuous.
General statement: a subset of C (X ) is compact ⇔ it is closed,
pointwise bounded and (uniformly) equicontinuous.
Very general statement: a subset F of C (X ) is relatively compact in
the topology induced by the uniform norm ⇔ it is equicontinuous and
pointwise bounded.
Corollary: a sequence in C (X ) is uniformly convergent ⇔ it is
(uniformly) equicontinuous and converges pointwise to a function
(not necessarily continuous a-priori).
Chang Liu (THU) Gradient Flow April 24, 2017 90 / 91
Thanks!

Chang Liu (THU) Gradient Flow April 24, 2017 91 / 91

You might also like