0% found this document useful (0 votes)

54 views

Gradient Flow

The document discusses gradient flow in the Euclidean space. It defines gradient flow as a smooth curve where the derivative is equal to the negative gradient of a function. It examines variants of gradient flow including when the function is convex or semi-convex. It also discusses approximating gradient flow curves using the minimizing movement scheme, where the curve is approximated as the solution of an optimization problem with a quadratic penalty term. It proves that as the time step goes to zero, the approximated curves converge uniformly to the true gradient flow curve.

Uploaded by

redditor1276

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views

Gradient Flow

Uploaded by

redditor1276

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 91

Gradient Flow

Chang Liu

Tsinghua University

April 24, 2017

Chang Liu (THU) Gradient Flow April 24, 2017 1 / 91

Contents
1 Introduction
2 Gradient flow in the Euclidean space
Variants of Gradient Flow in the Euclidean Space
Approximating Curves
Characterizing Properties
3 Gradient Flow in Metric Spaces
Generalization of Basic Concepts
Generalization of Gradient Flow to Metric Spaces
4 Gradient Flows on Wasserstein Spaces
Recap. of Optimal Transport Problems
The Wasserstein Space
Gradient Flows on W2 (Ω), Ω ⊂ Rn
Numerical methods from the JKO scheme
5 Application
6 My Remarks
7 Appendix
Chang Liu (THU) Gradient Flow April 24, 2017 2 / 91
Introduction

Introduction

Chang Liu (THU) Gradient Flow April 24, 2017 3 / 91

Introduction

Definition (Gradient Flow in Linear Space)

X is a linear space, and F : X → R is smooth. Gradient flow (or steepest
descent curve) is a smooth curve x : R → X such that

x 0 (t) = −∇F (x(t)).

What shall we consider next and where can it be applied?

1 Existence and uniqueness of the solution
Since many PDEs are in the form of a gradient flow, the analysis can be
applied to them.
Example
For X = LR2 (Rn ), a Hilbert space, and for Dirichlet energy
F (u) = 12 |∇u(x)|2 dx, the Heat Equation ∂t u = ∇2 u is a gradient flow
problem.
Chang Liu (THU) Gradient Flow April 24, 2017 4 / 91
Introduction

Introduction

Definition (Gradient Flow in Linear Space)

X is a linear space, and F : X → R is smooth. Gradient flow (or steepest
descent curve) is a smooth curve x : R → X such that

x 0 (t) = −∇F (x(t)).

What shall we consider next and where can it be applied?

2 Numerical methods and their convergence
Since gradient flow gradually minimizes F (x), so many optimization
methods are related to it, e.g. gradient descent, proximal descent
methods, mirror descent.

Chang Liu (THU) Gradient Flow April 24, 2017 5 / 91

Introduction

What shall we consider next and where can it be applied?

3 Generalization to the gradient flow on general metric space.
The need of viewing PDEs as gradient flows on general metric spaces,
thus wider applicability.
Example
PDEs in the continuity equation form ∂t ρ − ∇ · (ρv ) = 0, where v = ∇[δF /δρ], can
be cast as a gradient flow on the space of probabilities with Wasserstein distance.
Heat Equation can also be viewed as a gradient flow in the Wasserstein space.

The need of minimizing functionals on metric space.

Example
Optimization w.r.t. probability distributions, e.g. minq KL(q||p). Optimization
without parameterization is possible! (e.g. Stein Variational Gradient Descent)

Chang Liu (THU) Gradient Flow April 24, 2017 6 / 91

Gradient flow in the Euclidean space

Gradient Flow in the Euclidean Space

Chang Liu (THU) Gradient Flow April 24, 2017 7 / 91

Gradient flow in the Euclidean space Variants of Gradient Flow in the Euclidean Space

Gradient Flow in the Euclidean Space

Variants of Gradient Flow in the Euclidean Space

Chang Liu (THU) Gradient Flow April 24, 2017 8 / 91

Gradient flow in the Euclidean space Variants of Gradient Flow in the Euclidean Space

Existence, Uniqueness and Variants

Variant 0: F : Rn → R is differentiable (Cauchy Problem):

0
x (t) = −∇F (x(t)), for t > 0,
x(0) = x0 .

Theorem
∃! solution if ∇F is Lipschitz.

Chang Liu (THU) Gradient Flow April 24, 2017 9 / 91

Gradient flow in the Euclidean space Variants of Gradient Flow in the Euclidean Space

Existence, Uniqueness and Variants

Variant 1: F is convex and unnecessarily differentiable:

0
x (t) ∈ −∂F (x(t)), for a.e. t > 0,
x(0) = x0 ,

where x is an absolutely continuous curve, and

∂F (x) = {p ∈ Rn : ∀y ∈ Rn , F (y ) ≥ F (x) + p · (y − x)}.
Theorem
Any two solutions x1 , x2 of the above problem with different initial
conditions satisfy |x1 (t) − x2 (t)| ≤ |x1 (0) − x2 (0)|.

Corollary
For a given initial condition, the above problem has one unique solution.

Chang Liu (THU) Gradient Flow April 24, 2017 10 / 91

Gradient flow in the Euclidean space Variants of Gradient Flow in the Euclidean Space

Existence, Uniqueness and Variants

Variant 2: F is semi-convex (λ convex)

Definition (λ-convex function)

F is λ-convex (λ ∈ R) if F (x) − λ2 |x|2 is convex.

x 0 (t) ∈ −∂F (x(t)), for a.e. t > 0,

x(0) = x0 ,
where x is an absolutely continuous curve, and
∂F (x) = {p ∈ Rn : ∀y ∈ Rn , F (y ) ≥ F (x) + p · (y − x) + λ2 |y − x|2 }.

Theorem
Any two solutions x1 , x2 of the above problem with different initial
conditions satisfy |x1 (t) − x2 (t)| ≤ e −λt |x1 (0) − x2 (0)|.

Chang Liu (THU) Gradient Flow April 24, 2017 11 / 91

Gradient flow in the Euclidean space Variants of Gradient Flow in the Euclidean Space

Existence, Uniqueness and Variants

Variant 2: F is semi-convex (λ-convex)

Theorem
Any two solutions x1 , x2 of the above problem with different initial
conditions satisfy |x1 (t) − x2 (t)| ≤ e −λt |x1 (0) − x2 (0)|.

Corollary
For a given initial condition, the above problem has one unique solution.
If λ > 0 (strong convex), F has a unique minimizer x ∗ . x(t) ≡ x ∗ is a
solution, so for any solution x(t), |x(t) − x ∗ | ≤ e −λt |x(0) − x ∗ |.

Chang Liu (THU) Gradient Flow April 24, 2017 12 / 91

Gradient flow in the Euclidean space Approximating Curves

Gradient Flow in the Euclidean Space

Approximating Curves

Chang Liu (THU) Gradient Flow April 24, 2017 13 / 91

Gradient flow in the Euclidean space Approximating Curves

Definition (MMS)
Minimizing Movement Scheme (MMS): for a fixed small time step τ ,
define a sequence {xkτ }k by

τ |x − xkτ |2
xk+1 ∈ arg min F (x) + .
x 2τ
Importance:
Practical numerical method for approximating the curve.
Easier generalization to metric space, than x 0 = −∇F (x) itself.
Properties:
Existence of solution for mild F (e.g. Lipschitz and lower bounded by
C1 − C2 |x|2 ).
τ −x τ
xk+1 k τ ): implicit Euler scheme (more stable but hard
∈ −∂F (xk+1
τ
than explicit one: gradient descent)

Chang Liu (THU) Gradient Flow April 24, 2017 14 / 91

Gradient flow in the Euclidean space Approximating Curves

Convergence:
Define vk+1τ τ
, (xk+1 − xkτ )/τ , and v τ (t) = vk+1
τ , t ∈ (kτ, (k + 1)τ ].

Define two kinds of interpolations:

1) x τ (t) = xkτ , t ∈ (kτ, (k + 1)τ ];
2) x̃ τ (t) = xkτ + (t − kτ )vk+1
τ , t ∈ (kτ, (k + 1)τ ].

x̃ τ is continuous and (x̃ τ )0 = v τ ;

x τ is not continuous, but v τ (t) ∈ −∂F (x τ (t)).

Theorem
If F (x0 ) < +∞ and inf F > −∞, then up to a subsequence τj → 0, both
x̃ τj and x τj converge uniformly to a same curve x ∈ H 1 (Rn ) and v τj
weakly converges in L2 (R; Rn ) to a vector function v s.t. x 0 = v and
1) v (t) ∈ ∂F (x(t)) a.e., if F is λ-convex;
2) v (t) = −∇F (x(t)), ∀t, if F is C 1 .

Chang Liu (THU) Gradient Flow April 24, 2017 15 / 91

Gradient flow in the Euclidean space Approximating Curves

Details:
1 Lp space
For a measure space (S, Σ, R µ), first define
L(S; Rn ) , {f : S → Rn | S |f |p dµ < ∞}. It is a linear space.
Define Lp (S; Rn ) , L(S; Rn )/{f |f = 0 µ-a.e.} to be a quotient space
(i.e. treat all functions that are equal µ-a.e. as one same element in Lp ).
1/p
Define kf kp , S |f |p dµ
R
, then for 1 ≤ p ≤ ∞ it is a Banach space.
Only L2 (S; Rn ) Rcan be a Hilbert space, with inner product
hf , g iL2 (S;Rn ) , S fg dµ.
Lp (S) , Lp (S; R).

Chang Liu (THU) Gradient Flow April 24, 2017 16 / 91

Gradient flow in the Euclidean space Approximating Curves

Details:
2 Weak convergence in a Hilbert space H:
xn ∈ H, n ≥ 1, x ∈ H, xn * x is defined as:
∀f ∈ H0 , f (xn ) → f (x).
⇐⇒
∀y ∈ H, hxn , y iH → hx, y iH .
xn → x =⇒ xn * x.
xn * x, kxn k → kxk =⇒ xn → x.
If dim(H) ≤ ∞, xn * x ⇐⇒ xn → x.

Chang Liu (THU) Gradient Flow April 24, 2017 17 / 91

Gradient flow in the Euclidean space Approximating Curves

Details:
3 H k (Ω) space (Ω ⊂ Rn )
Weak derivative. For u ∈ C k (Ω) and φ ∈ Cc∞ (Ω) (·c for compact
support),
Z Z
uD α φdx = (−1)|α| φD α udx, (Integral by parts)
Ω Ω
Pn
where D α = ∂xα11 · · · ∂xαnn , and |α| = i=1 αi is fixed as k. So define the
weak α-th partial derivative of u as v :
Z Z
uD α φdx = (−1)|α| φv dx, ∀φ ∈ Cc∞ (Ω).
Ω Ω
If it exists, it is uniquely defined a.e.
Chang Liu (THU) Gradient Flow April 24, 2017 18 / 91
Gradient flow in the Euclidean space Approximating Curves

3 H k (Ω) space (Ω ⊂ Rn )
Weak derivative. For u ∈ C k (Ω) and φ ∈ Cc∞ (Ω) (·c for compact
support),
Z Z
uD α φdx = (−1)|α| φD α udx, (Integral by parts)
Ω Ω
Pn
where D α = ∂xα11 · · · ∂xαnn , and |α| = i=1 αi is fixed as k. So define the
weak α-th partial derivative of u as v :
Z Z
|α|
α
uD φdx = (−1) φv dx, ∀φ ∈ Cc∞ (Ω).
Ω Ω
If it exists, it is uniquely defined a.e.
Sobolev space W k,p (Ω) for k ∈ N and p ∈ [1, ∞]:
W k,p (Ω) = {u ∈ Lp (Ω) : D α u ∈ Lp (Ω), ∀|α| ≤ k},
with norm:
( P 1/p
kukW k,p (Ω) = kD α ukpLp (Ω)
|α|≤k , 1 ≤ p < +∞, .
max|α|≤k kD α ukL∞ (Ω) , p = +∞.
W k,p (Ω) is a Banach space.
H k (Ω) , W k,2 (Ω). They are Hilbert spaces.
Chang Liu (THU) Gradient Flow April 24, 2017 19 / 91
Gradient flow in the Euclidean space Approximating Curves

Details:
4 Up to a subsequence
There exists a sequence τj → 0 s.t. x̃ τj and x τj uniformly converge and
v τj weakly converge.

Chang Liu (THU) Gradient Flow April 24, 2017 20 / 91

Gradient flow in the Euclidean space Approximating Curves

Chang Liu (THU) Gradient Flow April 24, 2017 21 / 91

Gradient flow in the Euclidean space Characterizing Properties

Gradient Flow in the Euclidean Space

Characterizing Properties

Chang Liu (THU) Gradient Flow April 24, 2017 22 / 91

Gradient flow in the Euclidean space Characterizing Properties

Motivation
x 0 = −∇F (x) (or x 0 ∈ −∂F (x)) is hard to generalize to metric space!
There is nothing but distance in metric space, so ∇F (x) or ∂F (x)
cannot be defined! (different from manifold)
Use two properties of gradient flow that can characterize it and can
be generalized to metric space.

Chang Liu (THU) Gradient Flow April 24, 2017 23 / 91

Gradient flow in the Euclidean space Characterizing Properties

Two charactering properties of gradient flow in Rd :

Energy Dissipation Equality (EDE) for F ∈ C 1 (Ω), Ω ⊂ Rn :
Z t
1 0 2 1 2
F (x(s))−F (x(t)) = |x (r )| + |∇F (x(r ))| dr , ∀0 ≤ s < t ≤ 1
s 2 2

is equivalent to x 0 = −∇F (x). Note it is equivalent even for “≥” (i.e.

“≥” ⇐⇒ “=”).
Evolution Variational Inequality (EVI) for λ-convex function F :

d 1 λ
|x(t) − y |2 ≤ F (y ) − F (x(t)) − |x(t) − y |2 , ∀y ∈ X
dt 2 2
is equivalent to x 0 (t) ∈ −∂F (x(t)).
Sometimes also denoted as EVIλ .
It is important for establishing the uniqueness and stability of gradient
flow.

Chang Liu (THU) Gradient Flow April 24, 2017 24 / 91

Gradient Flow in Metric Spaces

Chang Liu (THU) Gradient Flow April 24, 2017 25 / 91

Gradient Flow in Metric Spaces Generalization of Basic Concepts

Gradient Flow in Metric Spaces

Generalization of Basic Concepts

Chang Liu (THU) Gradient Flow April 24, 2017 26 / 91

Gradient Flow in Metric Spaces Generalization of Basic Concepts

For metric space (X , d),

Definition
Metric derivative of a curve ω : [0, 1] → X

d(ω(t + h), ω(t))

|ω 0 |(t) = lim ,
h→0 |h|

if the limit exists.

If ω is Lipschitz, |ω 0 |(t) exists for a.e. t ∈ [0, 1].

Rt
d(ω(t0 ), ω(t1 )) ≤ t01 |ω 0 |(s)ds.

Chang Liu (THU) Gradient Flow April 24, 2017 27 / 91

Gradient Flow in Metric Spaces Generalization of Basic Concepts

For metric space (X , d),

In (X , d), ω 0 cannot be defined, but |ω 0 | can.
Definition
ω : [0, 1] → X is absolutely continuous if ∃g ∈ L1 ([0, 1]) s.t.
Z t1
d(ω(t0 ), ω(t1 )) ≤ g (s)ds, ∀t0 < t1 .
t0

Let AC(X ) be the set of such curves.

AC ⇒ Lipschitz
AC ⇒ Metric derivative exists a.e.

Chang Liu (THU) Gradient Flow April 24, 2017 28 / 91

Gradient Flow in Metric Spaces Generalization of Basic Concepts

For metric space (X , d),

Definition
Length of the curve ω : [0, 1] → X :
(n−1 )
X
Length(ω) , sup d(ω(tk ), ω(tk+1 )) : n ≥ 1, 0 = t0 < · · · < tn = 1 .
k=0

R1
If ω ∈ AC(X ), Length(ω) = 0 |ω 0 |(t)dt.

Chang Liu (THU) Gradient Flow April 24, 2017 29 / 91

Gradient Flow in Metric Spaces Generalization of Basic Concepts

For metric space (X , d),

Definition
Geodesic between x0 and x1 in X : a curve ω s.t. ω(0) = x0 , ω(1) = x1 ,
and Length(ω) = minω̃ {Length(ω̃) : ω̃(0) = x0 , ω̃(1) = x1 }.

This is the generalization of straight lines in Rn , and is used to extend

convexity.
Definition
Length space: metric space (X , d) s.t.
∀x, y ∈ X , d(x, y ) = inf ω∈AC(X ) {Length(ω) : ω(0) = x, ω(1) = y }.
Geodesic space: length space and geodesic exists for any pair of
points.

Riemann manifolds are geodesic spaces.

Chang Liu (THU) Gradient Flow April 24, 2017 30 / 91

Gradient Flow in Metric Spaces Generalization of Basic Concepts

For geodesic space (X , d),

Definition
Geodesic convexity: in a geodesic metric space, a function F : X → R
that is convex along geodesics:

F (x(t)) ≤ (1 − t)F (x(0)) + tF (x(1)),

where x(t) is a geodesic joining x(0) and x(1).

λ-geodesic convexity in a geodesic metric space, a function
F : X → R that is λ-convex along geodesics:

t(1 − t) 2
F (x(t)) ≤ (1 − t)F (x(0)) + tF (x(1)) − λ d (x(0), x(1)).
2

Chang Liu (THU) Gradient Flow April 24, 2017 31 / 91

Gradient Flow in Metric Spaces Generalization of Basic Concepts

For metric space (X , d),

Definition
g : X → R is an upper gradient of F : X → R: for every Lipschitz
curve x, Z 1
|F (x(0)) − F (x(1))| ≤ g (x(t))|x 0 |(t)dt.
0
Local Lipschitz constant of F :

|F (x) − F (y )|
|∇F |(x) = lim sup .
y →x d(x, y )

Descending slope (or just slope) of F :

[F (x) − F (y )]+
|∇− F |(x) = lim sup .
y →x d(x, y )

If F is Lipschitz, |∇F | is an upper gradient.

Chang Liu (THU) Gradient Flow April 24, 2017 32 / 91
Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces

Gradient Flow in Metric Spaces

Generalization of Gradient Flow to Metric Spaces

Chang Liu (THU) Gradient Flow April 24, 2017 33 / 91

Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces

Three ways to generalize gradient flow to metric space: EDE-GF, EVI-GF,

MMS-GF.
Definition (EDE-GF)
Let (X , d) be a metric space, F : X → R and g : X → R is an upper
gradient of F . EDE-GF is a curve x : [0, 1] → X with metric derivative a.e.
such that:
Z t
1 0 2 1 2
F (x(s)) − F (x(t)) = |x (r )| + g (x(r )) dr , ∀0 ≤ s < t ≤ 1.
s 2 2

Existence is easy to guarantee.

Not enough to guarantee uniqueness.

Chang Liu (THU) Gradient Flow April 24, 2017 34 / 91

Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces

Three ways to generalize gradient flow to metric space: EDE-GF, EVI-GF,

MMS-GF.
Definition (EVI-GF)
Let (X , d) be a geodesic space, F : X → R is λ-geodesically convex.
EVI-GF is a curve x : [0, 1] → X such that:

d 1 λ
d(x(t), y )2 ≤ F (y ) − F (x(t)) − d(x(t), y )2 , ∀y ∈ X .
dt 2 2

EVI-GF ⇒ EDE-GF
Uniqueness and contractivity: for two EVI-GFs x(t) and y (s),
d 1 λ
d(x(t), y (s))2 ≤ F (y (s)) − F (x(t)) − d(x(t)), y (s))2 ,
dt 2 2
d 1 λ
d(x(t), y (s))2 ≤ −F (y (s)) + F (x(t)) − d(x(t)), y (s))2 .
ds 2 2
1 2 d
Define E (t) = 2 d(x(t), y (t)) , then dt E (t) ≤ −2λE (t)
⇒ d(x(t), y (t)) ≤ e −λt d(x(0), y (0)), which gives uniqueness for a
given initial condition and exponential convergence for λ > 0.
Chang Liu (THU) Gradient Flow April 24, 2017 35 / 91
Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces

Three ways to generalize gradient flow to metric space: EDE-GF, EVI-GF,

MMS-GF.
Definition (EVI-GF)
Let (X , d) be a geodesic space, F : X → R is λ-geodesically convex.
EVI-GF is a curve x : [0, 1] → X such that:

d 1 λ
d(x(t), y )2 ≤ F (y ) − F (x(t)) − d(x(t), y )2 , ∀y ∈ X .
dt 2 2
A strong condition; existence is hard to guarantee.
A sufficient condition for the existence: Compatible Convexity along
Generalized Geodesics (C2 G2 ):
∀x0 , x1 ∈ X , ∀y ∈ X , ∃x : [0, 1] → X s.t. x(0) = x0 , x(1) = x1 and
t(1 − t) 2
F (x(t)) ≤ (1 − t)F (x0 ) + tF (x1 ) − λ d (x0 , x1 ),
2
d 2 (x(t), y ) ≤ (1 − t)d 2 (x0 , y ) + td 2 (x1 , y ) − t(1 − t)d 2 (x0 , x1 ),
i.e. λ-convexity of F and 2-convexity of x 7→ d 2 (x, y ) along a same
curve (not necessarily geodesic).
Chang Liu (THU) Gradient Flow April 24, 2017 36 / 91
Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces

Three ways to generalize gradient flow to metric space: EDE-GF, EVI-GF,

MMS-GF.
Definition (Generalized MMS)
Generalization of Minimizing Movement Scheme in a metric space (X , d):
for Lipschitz F : X → R ∪ {+∞}, define

τ d(x, xkτ )2
xk+1 ∈ arg min F (x) + .
x 2τ
Define two kinds of interpolations in a similar way:
1) Define x τ (t) = xkτ , t ∈ (kτ, (k + 1)τ ];
2) Define x̃ τ (t), t ∈ (kτ, (k + 1)τ ] to be the constant-speed geodesic
between xkτ and xk+1 τ .

Chang Liu (THU) Gradient Flow April 24, 2017 37 / 91

Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces

Three ways to generalize gradient flow to metric space: EDE-GF, EVI-GF,

MMS-GF.
Define two kinds of interpolations in a similar way:
1) Define x τ (t) = xkτ , t ∈ (kτ, (k + 1)τ ];
2) Define x̃ τ (t), t ∈ (kτ, (k + 1)τ ] to be the constant-speed geodesic
between xkτ and xk+1 τ . (So we require X to be a length space?)

Definition
Constant-speed geodesic: in a length space, a curve ω : [t0 , t1 ] → X s.t.

|t − s|
d(ω(t), ω(s)) = d(ω(t0 ), ω(t1 )), ∀t, s ∈ [t0 , t1 ].
t1 − t0

Constant-speed
R t1geodesics are geodesics:
d(ω(t0 ),ω(t1 ))
Length(ω) = t0 t1 −t0 dt = d(ω(t0 ), ω(t1 )).
The followings are equivalent:
1 ω : [t0 , t1 ] → X is a constant-speed geodesic joining x0 and x1 ;
2 ω ∈ AC(X ) and |ω 0 |(t) = d(ω(tt10−t
),ω(t1 ))
0
a.e.;
R t1 0 p
3 ω ∈ arg min{ t0 |ω |(t) dt : ω(t0 ) = x0 , ω(t1 ) = x1 }, ∀p > 1.
Chang Liu (THU) Gradient Flow April 24, 2017 38 / 91
Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces

Three ways to generalize gradient flow to metric space: EDE-GF, EVI-GF,

Define v τ . On metric (length) spaces, only its the norm can be

defined: set |v τ | as the piecewise constant speed of x̃ τ ,

|v τ |(t) = d(xk+1
τ
, xkτ )/τ, t ∈ (kτ, (k + 1)τ ].

Definition (MMS-GF)
Let (X , d) be a metric space (not necessarily length space). A curve
x : [0, T ] → X is called Generalized Minimizing Movements (GMM) (I
would call it MMS-GF) if there exists a sequence τj → 0 s.t. x τj uniformly
converges to x in (X , d).

Chang Liu (THU) Gradient Flow April 24, 2017 39 / 91

Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces

Three ways to generalize gradient flow to metric space: EDE-GF, EVI-GF,

MMS-GF.
Definition (MMS-GF)
Let (X , d) be a metric space (not necessarily length space). A curve
x : [0, T ] → X is called (by me) MMS-GF if there exists a sequence
τj → 0 s.t. x τj uniformly converges to x in (X , d).

Existence analysis:
Condition for the existence of xkτ :
The sub-level set {x : F (x) ≤ c} is compact in X , and F is Lipschitz.
(The corresponding topology is either the one induced by d, or a weaker topology s.t. d is
lower semi-continuous w.r.t. it.)
Condition for the existence of limit curves (i.e. MMS-GF):
Existence of xkτ is enough!
τ
d(xk+1 ,xkτ )2 √
Due to 2τ
≤ F (xkτ ) − F (xk+1
τ ), we have d(x τ (t), x τ (s)) ≤ C (|t − s|1/2 + τ ),
i.e. {x τ }τ are equi-Hölder continuous with exponent 1/2 (up to a negligible error of order
√
τ ). So by AA theorem, the set {x τ }τ has uniformly converging subsequences, i.e.
MMS-GF. But not unique and no relation with F (EDE or EVI) is obtained.
Chang Liu (THU) Gradient Flow April 24, 2017 40 / 91
Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces

Three ways to generalize gradient flow to metric space: EDE-GF, EVI-GF,

To relate MMS-GF to F and other generalizations:

If in addition to “{x : F (x) ≤ c} is compact in RX , F is Lipschitz”, F
and |∇− F | are lower-semicontinuous, we have 12 0t |x 0 |(r )2 dr
Rt
+ 21 0
|∇− F (x(r ))|2 dr ≤ F (x(0)) − F (x(t)), ∀0 ≤ t ≤ T . (not EDE)
the slope |∇−
If additionally,
1 t 0
R F |−is an upper
1 t
gradient of F , we have
EDE: 2 s |x |(r ) dr + 2 s |∇ F (x(r ))|2 dr ≤
2
R

F (x(s)) − F (x(t)), ∀0 ≤ s < t ≤ T .

If F is λ-geodesically convex, all the conditions are met.

Chang Liu (THU) Gradient Flow April 24, 2017 41 / 91

Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces

Conclusion for now

Table: Conclusion of extentions of gradient flow to metric space

Extension Requirement Existence Uniqueness and

Contractivity
EVI-GF X geodesic space, Hard. C2 G2 is a suffi- Guaranted
F λ-geod. convex cient condition
EDE-GF X metric space Easy Not guaranteed
MMS-GF X metric space Relatively easy. “{x : Not guaranteed
F (x) ≤ c} compact and
F Lipschitz” or “F λ-
geod. convex” suffices
EVI-GF ⊂ EDE-GF
MMS-GF ⊂ EDE-GF if “{x : F (x) ≤ c} compact, F Lipschitz, F and
|∇− F | lower-semicont., |∇− F | is an upper grad. of F ” or “F λ-geod.
convex”
Chang Liu (THU) Gradient Flow April 24, 2017 42 / 91
Gradient Flows on Wasserstein Spaces

Gradient Flows on Wasserstein Spaces

Chang Liu (THU) Gradient Flow April 24, 2017 43 / 91

Gradient Flows on Wasserstein Spaces Recap. of Optimal Transport Problems

Gradient Flows on Wasserstein Spaces

Recap. of Optimal Transport Problems

Chang Liu (THU) Gradient Flow April 24, 2017 44 / 91

Gradient Flows on Wasserstein Spaces Recap. of Optimal Transport Problems

Recap. of Optimal Transport Problems

Settings
Let X , Y be two measurable spaces, µ ∈ P(X ) and ν ∈ P(Y ) are
fixed measures, Let c : X × Y → R be a cost function.
Definition (push-forward of a measure)
For a measurable function T : X → Y and a measure µ ∈ P(X ), define
the push-forward of µ under T , T# µ, to be a measure on Y s.t.

T# µ(A) = µ(T −1 (A)), ∀A ∈ σ-algebra of Y .

Example
For X = Y = Rn and T invertible, then in terms of p.d.f.,
T# µ = (µ ◦ T −1 )|det(∇T −1 )|, i.e. rule of change of variables.

Chang Liu (THU) Gradient Flow April 24, 2017 45 / 91

Gradient Flows on Wasserstein Spaces Recap. of Optimal Transport Problems

Recap. of Optimal Transport Problems

Monge’s Problem:
Z
(MP) inf c(x, T (x))dµ(x).
T# µ=ν X

(Optimal) T is called a (optimal) transport map.

The problem may not be feasible.
Kantorovich’s Problem:
Z
(KP) inf c(x, y )dγ(x, y ),
γ∈Π(µ,ν) X ×Y

where Π(µ, ν) , {γ|(πX )# γ = µ, (πY )# γ = ν}.

(Optimal) γ is called a (optimal) transport plan.
The problem is always feasible.
MP is a special case of KP, where γ is restricted to the form
γ = (id × T )# µ. If T ∗ exists, γ ∗ = (id × T ∗ )# µ is also optimal.
Chang Liu (THU) Gradient Flow April 24, 2017 46 / 91
Gradient Flows on Wasserstein Spaces Recap. of Optimal Transport Problems

Recap. of Optimal Transport Problems

Dual Kantorovich Problem:

Direct form:
Z Z
(DKP) sup φdµ + ψdν.
φ∈L1 (X ),ψ∈L1 (Y ), X Y
φ(x)+ψ(y )≤c(x,y )

Reformulation:
Definition
c-transform (c-conjugate) of χ : X → R̄, χc : Y → R̄, is defined as
χc (y ) , inf x∈X c(x, y ) − χ(x).
Ψc (X ) , {χc |χ : X → R̄}. ψ : Y → R̄ is c-concave if ψ ∈ Ψc (X ).

Z Z
0
(DKP ) sup φdµ + φc dν.
φ∈Ψc (X ) X Y

Chang Liu (THU) Gradient Flow April 24, 2017 47 / 91

Gradient Flows on Wasserstein Spaces Recap. of Optimal Transport Problems

Recap. of Optimal Transport Problems

Dual Kantorovich Problem:

Reformulation:
Z Z
(DKP 0 ) sup φdµ + φc dν.
φ∈Ψc (X ) X Y

Definition (Kantorovich potential)

The optimal φ of (DKP00 ) is called Kantorovich potential, denoted by ϕ.

When c is uniformly continuous (e.g. when c is continuous and X is

compact), then the existence of Kantorovich potential ϕ can be proven
(by AA theorem).

Remark
Strong duality holds: KP(µ, ν) = DKP(µ, ν).

Chang Liu (THU) Gradient Flow April 24, 2017 48 / 91

Gradient Flows on Wasserstein Spaces Recap. of Optimal Transport Problems

Recap. of Optimal Transport Problems

Dual Kantorovich Problem:

Special case 1: X = Y , c(x, y ) = d(x, y ) is a distance:
Z Z
(DKP1) sup φdµ − φdν.
φ∈Lip1 X X

Special case 2: X = Y = Ω ⊂ Rn and c(x, y ) = 12 |x − y |2 :

Theorem
For quadratic cost and Ω ⊂ Rn close, bounded and connected, ∃! optimal
transport plan γ ∗ for (KP).
Additionally, if µ is absolutely continuous, optimal transport map T ∗ exists and
γ ∗ = (id, T ∗ )# µ. Moreover, there exists a Kantorovich potential ϕ s.t. ∇ϕ is
2
unique µ-a.e, and T = ∇u a.e., where u(x) , x2 − φ(x) is convex.

Chang Liu (THU) Gradient Flow April 24, 2017 49 / 91

Gradient Flows on Wasserstein Spaces Recap. of Optimal Transport Problems

Recap. of Optimal Transport Problems

Dual Kantorovich Problem:

Special case 2: X = Y = Ω ⊂ Rn and c(x, y ) = 12 |x − y |2 :

Corollary
Under the same condition, any gradient of a convex function is an optimal map
between µ and its image measure.
Optimal transport map uniquely exists for c(x, y ) = h(x − y ) with h strictly
convex. (e.g. |x − y |p , p > 1).

Chang Liu (THU) Gradient Flow April 24, 2017 50 / 91

Gradient Flows on Wasserstein Spaces The Wasserstein Space

Gradient Flows on Wasserstein Spaces

The Wasserstein Space

Chang Liu (THU) Gradient Flow April 24, 2017 51 / 91

Gradient Flows on Wasserstein Spaces The Wasserstein Space

The Wasserstein Space

Definition
On metricR space (X , d), for p ≥ 1 and a fixed point x0 ∈ X , define
mp (µ) , X d(x, x0 )p dµ(x), and Pp (X ) , {µ ∈ P(X ) : mp (µ) < +∞},
which is independent of the choice of x0 .

Theorem
1/p
d(x, y )p dγ(x, y )
R
Wp (µ, ν) , inf γ∈Π(µ,ν) X is a distance on Pp (X )

Definition (Wasserstein space)

Wp (X ) , (Pp (X ), Wp ).

Chang Liu (THU) Gradient Flow April 24, 2017 52 / 91

Gradient Flows on Wasserstein Spaces The Wasserstein Space

The Wasserstein Space

Definition (Wasserstein space)

Wp (X ) , (Pp (X ), Wp ).

Theorem
In Wp (X ) with p ≥ 1, given µ, µn ∈ Pp (X ), n ∈ N, the followings are
equivalent:
µn → µ w.r.t. Wp ;
µn * µ and mp (µn ) → mp (µ);
R R
X φdµn0 → X φdµ, ∀φ ∈
φ ∈ C (X ) : ∃A, B ∈ R s.t. |φ(x)| ≤ A + Bd(x, x0 )p , ∀x, x0 ∈ X .

Chang Liu (THU) Gradient Flow April 24, 2017 53 / 91

Gradient Flows on Wasserstein Spaces The Wasserstein Space

The Wasserstein Space

Special cases:
Case 1: (X , d) is compact.
P(X ) = Pp (X ), ∀p ≥ 1.
µn → µ w.r.t. Wp ⇐⇒ µn * µ.
Case 2: X = Ω ⊂ Rd and p ∈ [1, +∞). c(x, y ) = kx − y kp .

Lp distance between p.d.f.s of two measures: “vertical” distance. Wp

distance between two measures: “horizontal” distance.
p1 ≤ p2 =⇒ Wp1 ≤ Wp2 . If Ω is bounded, Wp1 ≤ Wp2 =⇒ p1 ≤ p2 .
Chang Liu (THU) Gradient Flow April 24, 2017 54 / 91
Gradient Flows on Wasserstein Spaces The Wasserstein Space

The Wasserstein Space

Geodesic on Wp (Ω):

Theorem (McCann’s displacement interpolation)

If Ω ∈ Rd is convex, then Wp (Ω) is a length space, and for
µ, ν ∈ Wp (Ω) and γ as optimal transport plan from µ to ν, then
µγ (t) , (πt )# γ, where πt (x, y ) , (1 − t)x + ty ,
is a constant-speed geodesic.
If p > 1, then all the constant-speed geodesics are of this form.
If additionally µ is absolutely continuous, then there is only one
geodesic, whose form is

µt = (Tt )# µ, where Tt , (1 − t)id + tT ,

where T is the optimal transport map from µ to ν.

Chang Liu (THU) Gradient Flow April 24, 2017 55 / 91
Gradient Flows on Wasserstein Spaces The Wasserstein Space

The Wasserstein Space

Geodesic convexity in W2 (Ω) (displacement convexity):
Definition is given by the general gradient flow theory.
Important examples:
Definition (Important functionals on W2 (Ω))
For f : R → R convex, V : Ω → R, W : Rd → R symmetric
(W (x) = W (−x)), define
Z Z ZZ
1
F(ρ) = f (ρ(x))dx, V(ρ) = V (x)dρ, W = W (x−y )dρ(x)dρ(y ).
2

Theorem
λ-convexity on Ω of V (or W ) =⇒ λ-geodesic convexity on W2 (Ω) of V (or
W).
f (0) = 0 and s d f (s −d ) is convex and decreasing, Ω is convex, 1 < p < ∞
=⇒ F is geodesically convex in W2 (Ω).
Chang Liu (THU) q
Gradient Flow April 24, 2017 56 / 91
Gradient Flows on Wasserstein Spaces Gradient Flows on W2 (Ω), Ω ⊂ Rn

Gradient Flows on Wasserstein Spaces

Gradient Flows on W2 (Ω), Ω ⊂ Rn

Chang Liu (THU) Gradient Flow April 24, 2017 57 / 91

Gradient Flows on Wasserstein Spaces Gradient Flows on W2 (Ω), Ω ⊂ Rn

Curves/flows on Wp (Ω), Ω ⊂ Rn

Continuity equation:
What is special for Wp (Ω), is that it is of probability distributions. The
curve/flow/dynamics in Wp (Ω), µt , represents the evolution of
distributions. This evolution can be associated with (viewed as a result of)
an evolution/dynamics in Rn , represented by vector field vt . The typical
relation between them is the continuity equation:

∂t µt + ∇ · (vt µt ) = 0.

Chang Liu (THU) Gradient Flow April 24, 2017 58 / 91

Gradient Flows on Wasserstein Spaces Gradient Flows on W2 (Ω), Ω ⊂ Rn

Curves/flows on Wp (Ω), Ω ⊂ Rn
Theorem
Let p > 1, Ω ⊂ Rd open, bounded and connected.
Let {µt }t∈[0,1] be an AC curve in Wp (Ω). Then for a.e. t ∈ [0, 1]
there exists a vector field vt ∈ Lp (µt ; Rd ) s.t. 1)
∂t µt + ∇ · (vt µt ) = 0 is satisfied in the sense of distributions; 2) for
a.e. t ∈ [0, 1], kvt kLp (µt ) ≤ |µ0 |(t).
Conversely, if {µt }t∈[0,1] ⊂ Pp (Ω) and ∀t we have a vector field
R1
vt ∈ Lp (µt ; Rd ) with 0 kvt kLp (µt ) dt < +∞ solving
∂t µt + ∇ · (vt µt ) = 0, then {µt }t∈[0,1] is AC in W(Ω) and for a.e.
t ∈ [0, 1], |µ0 |(t) ≤ kvt kLp (µt ) .
Thus in both cases, the conclusion can be strengthened with
|µ0 |(t) = kvt kLp (µt ) .
(I guess vti : Ω → R, 1 ≤ i ≤ d satisfies |vti |p is µt -integrable, and
P R 1/p
d i (x)|p dµ (x)
kvt kLp (µt ) = i=1 Ω |v t t .)
Chang Liu (THU) Gradient Flow April 24, 2017 59 / 91
Gradient Flows on Wasserstein Spaces Gradient Flows on W2 (Ω), Ω ⊂ Rn

Gradient Flows on W2 (Ω), Ω ⊂ Rn

We only consider absolutely continuous measures, denoted by ρ, so
that distribution density can be accessed.
Let F : W2 (Ω) → R̄ be a functional on Ww (Ω). Use MMS-GF to
define the gradient flow w.r.t. F :
W22 (ρ, ρτk )
ρτk+1 ∈ arg min F (ρ) +
ρ 2τ
General existence conditions apply, e.g. {ρ : F (ρ) ≤ c} compact and
F Lipschitz, or F λ-geodesically convex.
Special result:
Theorem
Let F : W2 (Ω) → R̄ be λ-geodesically convex, then MMS-GF w.r.t. F
exists. Let ρ0t , ρ1t be two solutions, and define E (t) , 12 W22 (ρ0t , ρ1t ). Then
E (t) ≤ e −λt E (0), which implies uniqueness for a given initial condition,
and stability and exponential convergence for λ > 0.
Chang Liu (THU) Gradient Flow April 24, 2017 60 / 91
Gradient Flows on Wasserstein Spaces Gradient Flows on W2 (Ω), Ω ⊂ Rn

Gradient Flows on W2 (Ω), Ω ⊂ Rn

To relate F and the vector field vt , we need the notion of first

variation.
Definition (First Variation)
First variation of a functional G : P(Ω) → R) is defined as δG
δρ (ρ) : Ω → R
d
R δG
s.t. dε G (ρ + εχ)|ε=0 = δρ (ρ)(x)dχ(x), ∀χ ∈ {χ : ∃ε0 s.t. ∀ε ∈
[0, ε0 ], ρ + εχ ∈ P(Ω)}.
d
(Recall that on Rd , ∇F ∈ Rd s.t. dε F (x + εv )|ε=0 = (∇F , v ), ∀v ∈ Rd .)

Chang Liu (THU) Gradient Flow April 24, 2017 61 / 91

Gradient Flows on Wasserstein Spaces Gradient Flows on W2 (Ω), Ω ⊂ Rn

Gradient Flows on W2 (Ω), Ω ⊂ Rn

To relate F and the vector field vt , we need the notion of first

variation.
Definition (Important functionals on W2 (Ω))
For f : R → R convex, V : Ω → R, W : Rd → R symmetric
(W (x) = W (−x)), define
Z Z ZZ
1
F(ρ) = f (ρ(x))dx, V(ρ) = V (x)dρ, W = W (x−y )dρ(x)dρ(y ).
2

Theorem
δF
δρ = f 0 (ρ), δV δW
δρ = V , δρ = W ∗ ρ (convolution)

Chang Liu (THU) Gradient Flow April 24, 2017 62 / 91

Gradient Flows on Wasserstein Spaces Gradient Flows on W2 (Ω), Ω ⊂ Rn

Gradient Flows on W2 (Ω), Ω ⊂ Rn

To relate F and the vector field vt , we need the notion of first

variation.
Theorem
The first variation of Wasserstein distance with cost function c:
δWc (ρ,ν)
δρ = ϕ, if ρ, ν are defined on Ω ⊂ Rd , c : Ω × Ω → R continuous,
and Kantorovich potential ϕ is unique and c-concave.

Chang Liu (THU) Gradient Flow April 24, 2017 63 / 91

Gradient Flows on Wasserstein Spaces Gradient Flows on W2 (Ω), Ω ⊂ Rn

Gradient Flows on W2 (Ω), Ω ⊂ Rn

Relate F and the vector field vt .
Theorem
W22 (ρ,ρτk )
For the Minimizing Movement Scheme ρτk+1 ∈ arg minρ F (ρ) + 2τ ,
the optimality condition is:
δF τ ϕ
(ρk+1 ) + = const.
δρ τ
where ϕ is the Kantorovich potential from ρτk+1 to ρτk .
Relation between T ∗ and ϕ: T ∗ (x) = x − ∇ϕ,
relation between vt and T : vt (x) = (x − T (x))/τ ,
so in the limit τ → 0, the gradient flow w.r.t. F induces a flow in Rn :
δF
vt (x) = −∇( (ρt ))(x),
δρ
and the flow ρt in W2 (Ω) is:
δF
∂t ρt − ∇ · ρt ∇ (ρt ) = 0.
δρ
Chang Liu (THU) Gradient Flow April 24, 2017 64 / 91
Gradient Flows on Wasserstein Spaces Numerical methods from the JKO scheme

Gradient Flows on Wasserstein Spaces

Numerical methods from the JKO scheme

Chang Liu (THU) Gradient Flow April 24, 2017 65 / 91

Gradient Flows on Wasserstein Spaces Numerical methods from the JKO scheme

Numerical methods from the JKO scheme

JKO: Jordan-Kinderleherer-Otto.
Solve the problem of the form min{F (ρ) + 12 W22 (ρ, ν) : ρ ∈ P(Ω)} (τ
is included in F .)
Two recent methods:
1) based on the Benamou-Brenier formula, for convex F (ρ);
2) based on methods from semi-discrete optimal transport, for
geodesically convex F . (involving techniques in computing geometry;
not covered in this slide)

Chang Liu (THU) Gradient Flow April 24, 2017 66 / 91

Gradient Flows on Wasserstein Spaces Numerical methods from the JKO scheme

Benamou-Brenier formula

Theorem (McCann’s displacement interpolation)

If Ω ∈ Rd is convex, for µ, ν ∈ Wp (Ω) and γ as optimal transport
plan from µ to ν, then
µγ (t) , (πt )# γ, where πt (x, y ) , (1 − t)x + ty ,
is a constant-speed geodesic.
If p > 1, then all the constant-speed geodesics are of this form.
If additionally µ is absolutely continuous, then there is only one
geodesic, whose form is

µt = (Tt )# µ, where Tt , (1 − t)id + tT ,

where T is the optimal transport map from µ to ν.

Chang Liu (THU) Gradient Flow April 24, 2017 67 / 91

Gradient Flows on Wasserstein Spaces Numerical methods from the JKO scheme

Benamou-Brenier formula
From this theorem, we can see:
For the cost c(x, y ) = |x − y |p , find an optimal transport ⇐⇒ find
constant-speed geodesic in Wp , since they are closely related and
(when p > 1 and µ absolutely continuous) they are one-to-one.
R1
Find constant-speed geodesic: minµt 0 |µ0 |(t)p dt.
1/p
In Wp , we have |µ0 |(t) = kvt kLp (µt ) = Ω |vt |p dµt
R
, where vt is
the velocity field solving the continuity equation.
So, we get the Benamou-Brenier formula (Time-dependent Kantorovich
Problem):
Z 1Z
(TKP1) min |vt |p dρt dt.
(ρt ,vt ):ρ0 =µ,ρ1 =ν, 0 Ω
∂t ρt +∇·(vt µt )=0

It is a kinetic energy minimization problem.

It selects constant-speed geodesics connecting µ to ν.
It is non-convex for (ρt , vt ).
Chang Liu (THU) Gradient Flow April 24, 2017 68 / 91
Gradient Flows on Wasserstein Spaces Numerical methods from the JKO scheme

Benamou-Brenier formula

Z 1Z
(TKP1) min |vt |p dρt dt.
(ρt ,vt ):ρ0 =µ,ρ1 =ν, 0 Ω
∂t ρt +∇·(vt µt )=0

Transform it to convex: let Et = vt ρt , and use (ρt , Et ) as arguments:

1Z
|Et |p
Z
(TKP2) min dxdt.
(ρt ,Et ):ρ0 =µ,ρ1 =ν, 0 Ω ρtp−1
∂t ρt +∇·Et =0

Chang Liu (THU) Gradient Flow April 24, 2017 69 / 91

Gradient Flows on Wasserstein Spaces Numerical methods from the JKO scheme

Benamou-Brenier formula
1Z
|Et |p
Z
(TKP2) min dxdt.
(ρt ,Et ):ρ0 =µ,ρ1 =ν, 0 Ω ρtp−1
∂t ρt +∇·Et =0
Further transformation:
Kq , {(a, b) ∈ R × Rd : a + q1 |b|q ≤ 0} for q = p/(p − 1) conjugate
of p. It is convex in R × Rd .
For t ∈ R and x ∈ Rd , define 
|x|p
 p1 t p−1 , if t > 0

fp (t, x) , sup (at + b · x) = 0, if t = 0, x = 0
(a,b)∈Kq 
 +∞, if t = 0, x 6= 0, or t < 0.
So the optimization problem can be reformulated
Z Z as Z Z
(TKP3) min sup adρ + b · dE ,
(ρt ,Et ):ρ0 =µ,ρ1 =ν, (a,b)∈
∂t ρt +∇·Et =0 C (Ω×[0,1];Kq )
RR
where indicates integral w.r.t. both space and time.
Chang Liu (THU) Gradient Flow April 24, 2017 70 / 91
Gradient Flows on Wasserstein Spaces Numerical methods from the JKO scheme

Benamou-Brenier formula
ZZ ZZ
(TKP3) min sup adρ + b · dE ,
(ρt ,Et ):ρ0 =µ,ρ1 =ν, (a,b)∈
∂t ρt +∇·Et =0 C (Ω×[0,1];Kq )
Utilizing
ZZ ZZ Z Z
sup − ∂t φdρ − ∇φ · dE + φ1 dν − φ0 dµ
φ∈C 1 ([0,1]×Ω)

0, if ρ0 = µ, ρ1 = ν, ∂t ρt + ∇ · Et = 0,
= ,
+∞, otherwise.
we get
ZZ ZZ
(TKP4) min sup (a − ∂t φ)dρ + (b − ∇φ) · dE
(ρt ,Et ) (a,b)∈C (Ω×[0,1];Kq ),
φ∈C 1 ([0,1]×Ω)
Z Z
+ φ1 dν − φ0 dµ.

Chang Liu (THU) Gradient Flow April 24, 2017 71 / 91

Gradient Flows on Wasserstein Spaces Numerical methods from the JKO scheme

Benamou-Brenier formula

ZZ ZZ
(TKP4) min sup (a − ∂t φ)dρ + (b − ∇φ) · dE
(ρt ,Et ) (a,b)∈C (Ω×[0,1];Kq ),
φ∈C 1 ([0,1]×Ω)
Z Z
+ φ1 dν − φ0 dµ.
R R
To simplify notation, let m =R (ρ, E ), AR= (a, b), m · A = adρ + b · dE ,
∇t,x φ = (∂t φ, ∇φ), G (φ) = φ1 dν − φ0 dµ, IKp (·) be indicator
function, then

(TKP40 ) min sup L(m, (A, φ)) , m · (A − ∇t,x φ) − IKp (A) + G (φ).
m A,φ

This is a mini-max problem.

Chang Liu (THU) Gradient Flow April 24, 2017 72 / 91

Gradient Flows on Wasserstein Spaces Numerical methods from the JKO scheme

Benamou-Brenier formula

(TKP40 ) min sup L(m, (A, φ)) , m · (A − ∇t,x φ) − IKp (A) + G (φ).
m A,φ

L(m, (A, φ)) is the Lagrangian of the form L(X , Y ) = X · ΛY − H(Y ),

where Λ is a linear operator. Its optimality condition

ΛY = 0
Λ∗ X − ∇H(Y ) = 0
is the same as the one of the augmented Lagrangian
L̃(X , Y ) = X · ΛY − H(Y ) − 2r |ΛY |2 :

ΛY = 0
,
Λ∗ X − ∇H(Y ) − r Λ∗ ΛY = 0
for any r > 0, and Λ∗ is its adjoint w.r.t. the inner product. So finally,
r
(TKP5) min sup m · (A − ∇t,x φ) − IKp (A) + G (φ) − kA − ∇t,x φk2 .
m A,φ 2
Chang Liu (THU) Gradient Flow April 24, 2017 73 / 91
Gradient Flows on Wasserstein Spaces Numerical methods from the JKO scheme

Benamou-Brenier formula

r
(TKP5) min sup m · (A − ∇t,x φ) − IKp (A) + G (φ) − kA − ∇t,x φk2 .
m A,φ 2

To solve this,
Optimize φ: minimize a quadratic functional in calculus of variations,
e.g. solving a Poisson equation
Optimize A: a pointwise minimization problem, specifically a
projection on the convex set Kq
Optimize m: gradient descent. m ← m − r (A − ∇t,x φ)

Chang Liu (THU) Gradient Flow April 24, 2017 74 / 91

Application

Chang Liu (THU) Gradient Flow April 24, 2017 75 / 91

Application

To be continued... :(

Chang Liu (THU) Gradient Flow April 24, 2017 76 / 91

My Remarks

Chang Liu (THU) Gradient Flow April 24, 2017 77 / 91

My Remarks

My Remarks
Given a functional F (ρ) on W2 (Ω) with Ω ⊂ Rn , if we want to minimize
it, we can find a gradient flow on W2 (Ω) defined by F , which gradually
minimizes F , by:
1 the MMS discretization with step size τ : we get {ρτ } , where
k k
W22 (ρ, ρτk )
ρτk+1 ∈ arg min F (ρ) + .
ρ 2τ
In this case we directly get a sequence of distributions DIRECTLY,
e.g. in terms of pdf.
2 simulating a dynamics/flow on Ω, which is associated with the
gradient flow on W2 (Ω) (or which is the cause/reason of the
evolution of the distribution described by the gradient flow on
W2 (Ω)). The dynamics/flow on Ω is governed by

d δF
ξt (x) = vt (x), vt (x) = −∇ (ρt ) (x).
dt δρ
In this case the distribution is embodied as samples from it. We will
Chang Liu (THU) Gradient Flow April 24, 2017 78 / 91
My Remarks

My Remarks on SVGD

Afterwards, we will only consider the second approach to get the

gradient flow.
Take F (ρ) = KL(ρ||p), for a fixed distribution p.
Compare the results of gradient flow and variation calculus. (Omit ·t
temporarily)

By Gradient Flow
F (ρ) = Ω ρ log pρ dx, δF
R
δρ = log ρ − log p + 1, so:

v (x) = ∇ log p(x) − ∇ log ρ(x).

Chang Liu (THU) Gradient Flow April 24, 2017 79 / 91

My Remarks

My Remarks on SVGD
By Variation Calculus
Find the “directional derivative” G (v , ρ) of F (ρ) w.r.t. the dynamics
d
dt ξt (x) = vt (x):
d
G (v , ρ) = F (ρ[ξ(ε) ] )|ε=0 , ξ (ε) (x) = x + εv (x),
dε
−1 −1
ρ[ξ(ε) ] (x) = ρ(ξ (ε) (x))|Jacξ (ε) | ≈ ρ(x − εv (x))|Jac(x − εv (x))|.
For F (ρ) = KL(ρ||p), by my written R notes on SVGD or the electronic
notes on R-SVGD, G (v , ρ) = Ω ρ[∇ log p · v + ∇ · v ]dx.

P, ρ):Rmaxv G (v , ρ), s.t. kv k = 1. If

Find v (x) s.t. it maximizes G (v
we take the norm as kv k = 12 ni=1 Ω vi2 (x)ρ(x)dx and introduce
Lagrange multiplier λ, n Z
λX
min max G (v , ρ) + vi2 (x)ρ(x)dx − λ.
λ v 2 Ω i=1
For F (ρ) = KL(ρ||p),
take the first variation w.r.t. vi , i.e. let
∂L Pn ∂L
∂vi − j=1 ∂j ∂(∂j vi ) = 0:
ρ∂i log p − ∂i ρ + λρv
Chang Liu (THU) Gradient 0, vi ∝ ∂i log p − ∂i log
i =Flow April ρ,
24, 2017 80 / 91
My Remarks

My Remarks on SVGD

By Variation Calculus

P, ρ):Rmaxv G (v , ρ), s.t. kv k = 1. If

Find v (x) s.t. it maximizes G (v
we take the norm as kv k = 12 ni=1 Ω vi2 (x)ρ(x)dx and introduce
Lagrange multiplier λ, n Z
λX
min max G (v , ρ) + vi2 (x)ρ(x)dx − λ.
λ v 2 Ω
i=1
For F (ρ) = KL(ρ||p),
take the first variation w.r.t. vi , i.e. let
∂L Pn ∂L
∂vi − j=1 ∂j ∂(∂j vi ) = 0:
ρ∂i log p − ∂i ρ + λρvi = 0, vi ∝ ∂i log p − ∂i log ρ,
as the same as the result by gradient flow.

However, in SVGD neither is adopted. It uses v in the space of

vector-valued RKHS and turn the objective as an inner product in it.

Chang Liu (THU) Gradient Flow April 24, 2017 81 / 91

My Remarks

My Remarks on General Results

But this result cannot even recover the case of F (ρ) = KL(ρ||p)! Nor can
it deduce the result of Gradient Flow v = −∇( δFδρ ) by
minλ maxv G (v , ρ) + λkv k − λ using variation calculus. Why? I would
prefer that there is something wrong in the above deduction of G (v , ρ).
Chang Liu (THU) Gradient Flow April 24, 2017 82 / 91
Appendix

Appendix

Chang Liu (THU) Gradient Flow April 24, 2017 83 / 91

Appendix

Compactness

A topological space X is compact if each of its open covers has a

finite subcover.
If X is additionally a metric space, then “X is compact” is equivalent
to:
X is sequentially compact: every sequence in X has a convergent
subsequence (the limit is in X , of course).
X is complete and totally bounded (∀ε > 0, X is a subset of the union
of FINITE open balls of radius ε).
X is limit point compact: every infinite subset of X has at least one
limit point in X .

Chang Liu (THU) Gradient Flow April 24, 2017 84 / 91

Appendix

Weak convergence of measures

Let X be a measurable space. R R

µn * µ: for any bounded function f : X → R, f µn → f µ.

Chang Liu (THU) Gradient Flow April 24, 2017 85 / 91

Appendix

Lower semicontinuity

On a topological space X , f : X → R ∪ {−∞, ∞} is lower

semicontinuous at x0 ∈ X if ∀ε > 0, ∃U a neighbourhood of x0 s.t.
∀x ∈ U, f (x) ≥ f (x0 ) − ε when f (x0 ) < +∞, and
limx→x0 f (x) = +∞ when f (x0 ) = +∞.
In metric space, this is equivalent to lim inf x→x0 f (x) ≤ f (x0 ).

Chang Liu (THU) Gradient Flow April 24, 2017 86 / 91

Appendix

Original notion of absolute continuity

I = [a, b] is a compact interval of R (when I is not compact AC can also

be defined, in a more general way). A function f : I → R is absolutely
continuous on IR if there exists a Lebesgue integrable function g on I s.t.
x
f (x) = f (a) + a g (t)dt, ∀x ∈ I .

Chang Liu (THU) Gradient Flow April 24, 2017 87 / 91

Appendix

Hölder space

Hölder condition: on Rd , |f (x) − f (y )| ≤ C kx − y kα , with exponent

α.
Hölder space C k,α (Ω): functions on Ω with continuous derivatives up
to order k and kth partial derivatives are Hölder continuous with
exponent 0 < α ≤ 1.
The larger α > 0 the stronger condition. So weaker than Lipschitz
(α = 1).
Compact inclusion C 0,β (Ω) → C 0,α (Ω), for 0 < α < β ≤ 1.

Chang Liu (THU) Gradient Flow April 24, 2017 88 / 91

Appendix

Equicontinuity

Let X and Y be two metric spaces, and F a family of functions from X to

Y . The family F is equicontinuous at a point x0 ∈ X if ∀ε > 0, ∃δ > 0
s.t. d(f (x0), f (x)) < ε, ∀f ∈ F , ∀x : d(x0 , x) < δ.

Concept δ depends on
Continuity ε, x0 , f
Uniform continuity ε, f
Pointwise equicontinuity ε, x0
Uniform equicontinuity ε

Chang Liu (THU) Gradient Flow April 24, 2017 89 / 91

Appendix

Ascoli-Arzelà’s theorem (AA theorem)

X : a compact Hausdorff space. C (X ): the space of continuous functions
on X .
Typical statement: for a sequence of real-valued continuous functions
{fn }n on a closed and bounded interval [a, b], 1) ∃ uniformly
converging subsequence {fnk }k ⇒ {fn }n is uniformly bounded and
equicontinuous; 2) every subsequence {fnk }k has a uniformly
convergent subsequence ⇒ {fn }n is uniformly bounded and
equicontinuous.
General statement: a subset of C (X ) is compact ⇔ it is closed,
pointwise bounded and (uniformly) equicontinuous.
Very general statement: a subset F of C (X ) is relatively compact in
the topology induced by the uniform norm ⇔ it is equicontinuous and
pointwise bounded.
Corollary: a sequence in C (X ) is uniformly convergent ⇔ it is
(uniformly) equicontinuous and converges pointwise to a function
(not necessarily continuous a-priori).
Chang Liu (THU) Gradient Flow April 24, 2017 90 / 91
Thanks!

Chang Liu (THU) Gradient Flow April 24, 2017 91 / 91

Statistics, 4th Edition by David Freedman, Robert Pisani
90% (67)
Statistics, 4th Edition by David Freedman, Robert Pisani
715 pages
Cliftonstrengths Quick Reference Card: Achiever
100% (7)
Cliftonstrengths Quick Reference Card: Achiever
3 pages
Fujita, Kato - On The Navier-Stokes Initial Value Problem. I
No ratings yet
Fujita, Kato - On The Navier-Stokes Initial Value Problem. I
47 pages
Stochastic Control Notes
No ratings yet
Stochastic Control Notes
173 pages
The Humongous Book of Algebra Problems 4
No ratings yet
The Humongous Book of Algebra Problems 4
142 pages
(Euclidean, Metric, and Wasserstein) Gradient Flows
No ratings yet
(Euclidean, Metric, and Wasserstein) Gradient Flows
68 pages
Advanced PDEs - Results and Definitions
No ratings yet
Advanced PDEs - Results and Definitions
12 pages
Survey Grad Flowsmtric
No ratings yet
Survey Grad Flowsmtric
65 pages
1609.03890v1
No ratings yet
1609.03890v1
65 pages
A Variant of The Lusternik Schnirelman Theory 4ssnrtxktc
No ratings yet
A Variant of The Lusternik Schnirelman Theory 4ssnrtxktc
10 pages
IWIAS Mini Course Opt GF Aug 2023 Nopause
No ratings yet
IWIAS Mini Course Opt GF Aug 2023 Nopause
26 pages
(Lezione Fermiane.) Katō, Tosio - Abstract Differential Equations and Nonlinear Mixed Problems-Scuola Normale Superiore (1985)
No ratings yet
(Lezione Fermiane.) Katō, Tosio - Abstract Differential Equations and Nonlinear Mixed Problems-Scuola Normale Superiore (1985)
92 pages
Gradient Flow PDF
No ratings yet
Gradient Flow PDF
62 pages
Limiting Behavior of Solutions of Subelliptic Heat Equations - Federica DRAGONI
No ratings yet
Limiting Behavior of Solutions of Subelliptic Heat Equations - Federica DRAGONI
13 pages
LECTURE NOTES ON THE DIPERNA-LIONS THEORY IN ABSTRACR MEASURE SPACES
No ratings yet
LECTURE NOTES ON THE DIPERNA-LIONS THEORY IN ABSTRACR MEASURE SPACES
29 pages
Critical Dirichlet Problem On H Domains of Carnot Groups
No ratings yet
Critical Dirichlet Problem On H Domains of Carnot Groups
21 pages
ViscositySolutions Tutorial
No ratings yet
ViscositySolutions Tutorial
44 pages
2406 08209v1
No ratings yet
2406 08209v1
12 pages
num_pde_fub_4
No ratings yet
num_pde_fub_4
8 pages
Ambrosio e Gangbo Hamilton Ian ODE
No ratings yet
Ambrosio e Gangbo Hamilton Ian ODE
35 pages
1-s2.0-0022123679900405-main
No ratings yet
1-s2.0-0022123679900405-main
20 pages
Sobolev Spaces
No ratings yet
Sobolev Spaces
141 pages
Optimization_with_partial_differential_equations
No ratings yet
Optimization_with_partial_differential_equations
8 pages
2103 00846
No ratings yet
2103 00846
28 pages
Existence Uniqueness and Regularity For Nonlinear Parabolic Equations With Nonlocal Terms - Nathael ALIBAUD
No ratings yet
Existence Uniqueness and Regularity For Nonlinear Parabolic Equations With Nonlocal Terms - Nathael ALIBAUD
31 pages
Notes On Partial Differential Equations: Department of Mathematics, University of California at Davis
No ratings yet
Notes On Partial Differential Equations: Department of Mathematics, University of California at Davis
224 pages
Bona Wu Inviscid
No ratings yet
Bona Wu Inviscid
14 pages
E7992-IranArze
No ratings yet
E7992-IranArze
9 pages
Direct Method Slides
No ratings yet
Direct Method Slides
44 pages
Lapois PDF
No ratings yet
Lapois PDF
110 pages
Existence and Concentration of Positive Solutions For A Class of Gradient Systems - Claudianor O. ALVES
No ratings yet
Existence and Concentration of Positive Solutions For A Class of Gradient Systems - Claudianor O. ALVES
21 pages
Viscosity Solutions of Hamilton-Jacobi Equations and Optimal Control Problems
No ratings yet
Viscosity Solutions of Hamilton-Jacobi Equations and Optimal Control Problems
63 pages
Existence Results For A Nonlinear Elliptic Problem Driven by A Non-Homogeneous Operator
No ratings yet
Existence Results For A Nonlinear Elliptic Problem Driven by A Non-Homogeneous Operator
17 pages
Math 181 Spring 2019 Notes
No ratings yet
Math 181 Spring 2019 Notes
95 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
2023 Chicone
No ratings yet
2023 Chicone
43 pages
document
No ratings yet
document
9 pages
Notes 220 17 PDF
No ratings yet
Notes 220 17 PDF
88 pages
Extra Notes 02
No ratings yet
Extra Notes 02
7 pages
Journal of Mathematical Analysis and Applications
No ratings yet
Journal of Mathematical Analysis and Applications
20 pages
Main A
No ratings yet
Main A
24 pages
Sobolev Spaces Book
No ratings yet
Sobolev Spaces Book
158 pages
Lecture 10
No ratings yet
Lecture 10
16 pages
Chapter 3
No ratings yet
Chapter 3
29 pages
0006011v1
No ratings yet
0006011v1
25 pages
(Ambrosetti A) Applications of Critical Point Theo
No ratings yet
(Ambrosetti A) Applications of Critical Point Theo
16 pages
Singular All power-JournalVersion
No ratings yet
Singular All power-JournalVersion
27 pages
Lecture Notes 156.01556
No ratings yet
Lecture Notes 156.01556
287 pages
Elgenfunction Expansions Associated with Second Order Differential Equations
From Everand
Elgenfunction Expansions Associated with Second Order Differential Equations
E. C. Titchmarsh
No ratings yet
Finite Difference Methods For HJB PDEs
No ratings yet
Finite Difference Methods For HJB PDEs
12 pages
BF 02417890
No ratings yet
BF 02417890
34 pages
Variational Methods in PDEs (Yan, 2008)
No ratings yet
Variational Methods in PDEs (Yan, 2008)
135 pages
Nehari_Coroane_15
No ratings yet
Nehari_Coroane_15
16 pages
Bounded and Unbounded Solutions For A Class of Quasi-Linear Elliptic Problems With A Quadratic Gradient Term
No ratings yet
Bounded and Unbounded Solutions For A Class of Quasi-Linear Elliptic Problems With A Quadratic Gradient Term
22 pages
Dynamics of Differential Equations
No ratings yet
Dynamics of Differential Equations
53 pages
Notes D If Jedna Cine
No ratings yet
Notes D If Jedna Cine
71 pages
2503.13922v1
No ratings yet
2503.13922v1
31 pages
Nehari_Coroane_11
No ratings yet
Nehari_Coroane_11
14 pages
Morse Theory For A Fourth Order Elliptic Equation With Exponential Nonlinearity
No ratings yet
Morse Theory For A Fourth Order Elliptic Equation With Exponential Nonlinearity
14 pages
M. Arrieta and Alexandre N. Carvalho 1999
No ratings yet
M. Arrieta and Alexandre N. Carvalho 1999
26 pages
Asymptotics in The Dirichlet Problem For Second Order Elliptic Equations With Degeneration On The Boundary
No ratings yet
Asymptotics in The Dirichlet Problem For Second Order Elliptic Equations With Degeneration On The Boundary
16 pages
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
3.5/5 (1)
Lecture Notes CoV 2022
No ratings yet
Lecture Notes CoV 2022
105 pages
A Finite Element Primer: 1.3 4 October 2012
No ratings yet
A Finite Element Primer: 1.3 4 October 2012
25 pages
Marcus, Modeling and Approximation of Stochastic Differential Equations Driven by Semimartingales-1
No ratings yet
Marcus, Modeling and Approximation of Stochastic Differential Equations Driven by Semimartingales-1
24 pages
Jacob Andrew Kelleys Resume
No ratings yet
Jacob Andrew Kelleys Resume
1 page
TAMU Quals
No ratings yet
TAMU Quals
102 pages
LET Us A An, A M.E.A,, N 2, - . - . Re: T. P. Krasulina
No ratings yet
LET Us A An, A M.E.A,, N 2, - . - . Re: T. P. Krasulina
7 pages
Notice To Employee Instructions For Employee: (Continued From The Back of Copy B.)
No ratings yet
Notice To Employee Instructions For Employee: (Continued From The Back of Copy B.)
1 page
Motivation Handout Engineering 2020
No ratings yet
Motivation Handout Engineering 2020
2 pages
Round: 0 Dec. 31, 2016: Selected Financial Statistics
No ratings yet
Round: 0 Dec. 31, 2016: Selected Financial Statistics
11 pages
142 Syllabus SP 17
No ratings yet
142 Syllabus SP 17
4 pages
142 MBH Keywords Sp. '17
No ratings yet
142 MBH Keywords Sp. '17
2 pages
Round: 0 Dec. 31, 2016: Selected Financial Statistics
No ratings yet
Round: 0 Dec. 31, 2016: Selected Financial Statistics
11 pages
A Journey by Train - Bijay Kant Dubey
No ratings yet
A Journey by Train - Bijay Kant Dubey
2 pages
Anthro 2AC Syllabus Fall 2015
No ratings yet
Anthro 2AC Syllabus Fall 2015
7 pages
Criterion Skills - Students Need To Be Able To .
No ratings yet
Criterion Skills - Students Need To Be Able To .
1 page
Interim Report Template 13 Jun 14
No ratings yet
Interim Report Template 13 Jun 14
7 pages
Results: Your Score: 90%!
No ratings yet
Results: Your Score: 90%!
9 pages
Civil Disobedience Antigone 2
No ratings yet
Civil Disobedience Antigone 2
18 pages
Engineering Dynamics Lecture 1
No ratings yet
Engineering Dynamics Lecture 1
33 pages
Ap 1st Year PHYSICS Imp PROBLEMS
No ratings yet
Ap 1st Year PHYSICS Imp PROBLEMS
7 pages
Improve Control of Liquid Level Loops
No ratings yet
Improve Control of Liquid Level Loops
8 pages
HSSRPTR - +2 Chemistry Chemical Kinetics
No ratings yet
HSSRPTR - +2 Chemistry Chemical Kinetics
17 pages
2nd Grade Common Core Math Assessments - PDF (PDFDrive)
100% (3)
2nd Grade Common Core Math Assessments - PDF (PDFDrive)
186 pages
Mstest 20#21
No ratings yet
Mstest 20#21
2 pages
Spherical Sliding Isolation Bearings With Adaptive Behavior: Theory
No ratings yet
Spherical Sliding Isolation Bearings With Adaptive Behavior: Theory
21 pages
ZANDER 2016 M2 Mock Exam Paper Answer
No ratings yet
ZANDER 2016 M2 Mock Exam Paper Answer
44 pages
Web Appendix 8A: Calculating Beta Coefficients
No ratings yet
Web Appendix 8A: Calculating Beta Coefficients
6 pages
Educational Statistics 1. The Concept of Statistics Does Not Exist in The Oblivion As It Has Its
No ratings yet
Educational Statistics 1. The Concept of Statistics Does Not Exist in The Oblivion As It Has Its
12 pages
INTEGRATION Exponential & Trig Functions
No ratings yet
INTEGRATION Exponential & Trig Functions
15 pages
Immigration Essay Topics
100% (2)
Immigration Essay Topics
4 pages
Stats 1 and 2
No ratings yet
Stats 1 and 2
14 pages
Em Teks g4 m3 Te Eng
No ratings yet
Em Teks g4 m3 Te Eng
543 pages
Basics of Python - DPP 01
No ratings yet
Basics of Python - DPP 01
5 pages
Topic 18 - Power Series
No ratings yet
Topic 18 - Power Series
13 pages
Stat Chapter 2
No ratings yet
Stat Chapter 2
15 pages
Fur Bath Height Techie
No ratings yet
Fur Bath Height Techie
5 pages
Download full Some Unusual Topics in Quantum Mechanics Pankaj Sharan ebook all chapters
100% (4)
Download full Some Unusual Topics in Quantum Mechanics Pankaj Sharan ebook all chapters
65 pages
Bussiness Statics Set 2. & 1 QN PDF
No ratings yet
Bussiness Statics Set 2. & 1 QN PDF
9 pages
Lecture 5, Beam Search & A Estrick
No ratings yet
Lecture 5, Beam Search & A Estrick
15 pages
Resonance Maths DPP No.5 Jee Advance
No ratings yet
Resonance Maths DPP No.5 Jee Advance
7 pages
Math
No ratings yet
Math
21 pages
Problem Set With Analysis Dhaka2008
No ratings yet
Problem Set With Analysis Dhaka2008
19 pages
CS102 Computer Programming I: Flowcharts
No ratings yet
CS102 Computer Programming I: Flowcharts
20 pages
The University of Akron Theoretical and Applied Mathematics
No ratings yet
The University of Akron Theoretical and Applied Mathematics
4 pages
Probability and Statistics Exam Help
No ratings yet
Probability and Statistics Exam Help
10 pages
Model Question Paper 2 21matcs41 - Cse Allied Branches
No ratings yet
Model Question Paper 2 21matcs41 - Cse Allied Branches
4 pages
AMJ Integral Transform Method: For Solving Differential Equations
No ratings yet
AMJ Integral Transform Method: For Solving Differential Equations
12 pages