Gradient Flow
Gradient Flow
Chang Liu
Tsinghua University
Introduction
Introduction
Introduction
Introduction
Theorem
∃! solution if ∇F is Lipschitz.
Corollary
For a given initial condition, the above problem has one unique solution.
x(0) = x0 ,
where x is an absolutely continuous curve, and
∂F (x) = {p ∈ Rn : ∀y ∈ Rn , F (y ) ≥ F (x) + p · (y − x) + λ2 |y − x|2 }.
Theorem
Any two solutions x1 , x2 of the above problem with different initial
conditions satisfy |x1 (t) − x2 (t)| ≤ e −λt |x1 (0) − x2 (0)|.
Corollary
For a given initial condition, the above problem has one unique solution.
If λ > 0 (strong convex), F has a unique minimizer x ∗ . x(t) ≡ x ∗ is a
solution, so for any solution x(t), |x(t) − x ∗ | ≤ e −λt |x(0) − x ∗ |.
Approximating Curves
Definition (MMS)
Minimizing Movement Scheme (MMS): for a fixed small time step τ ,
define a sequence {xkτ }k by
τ |x − xkτ |2
xk+1 ∈ arg min F (x) + .
x 2τ
Importance:
Practical numerical method for approximating the curve.
Easier generalization to metric space, than x 0 = −∇F (x) itself.
Properties:
Existence of solution for mild F (e.g. Lipschitz and lower bounded by
C1 − C2 |x|2 ).
τ −x τ
xk+1 k τ ): implicit Euler scheme (more stable but hard
∈ −∂F (xk+1
τ
than explicit one: gradient descent)
Convergence:
Define vk+1τ τ
, (xk+1 − xkτ )/τ , and v τ (t) = vk+1
τ , t ∈ (kτ, (k + 1)τ ].
Theorem
If F (x0 ) < +∞ and inf F > −∞, then up to a subsequence τj → 0, both
x̃ τj and x τj converge uniformly to a same curve x ∈ H 1 (Rn ) and v τj
weakly converges in L2 (R; Rn ) to a vector function v s.t. x 0 = v and
1) v (t) ∈ ∂F (x(t)) a.e., if F is λ-convex;
2) v (t) = −∇F (x(t)), ∀t, if F is C 1 .
Theorem
If F (x0 ) < +∞ and inf F > −∞, then up to a subsequence τj → 0, both
x̃ τj and x τj converge uniformly to a same curve x ∈ H 1 (Rn ) and v τj
weakly converges in L2 (R; Rn ) to a vector function v s.t. x 0 = v and
1) v (t) ∈ ∂F (x(t)) a.e., if F is λ-convex;
2) v (t) = −∇F (x(t)), ∀t, if F is C 1 .
Details:
1 Lp space
For a measure space (S, Σ, R µ), first define
L(S; Rn ) , {f : S → Rn | S |f |p dµ < ∞}. It is a linear space.
Define Lp (S; Rn ) , L(S; Rn )/{f |f = 0 µ-a.e.} to be a quotient space
(i.e. treat all functions that are equal µ-a.e. as one same element in Lp ).
1/p
Define kf kp , S |f |p dµ
R
, then for 1 ≤ p ≤ ∞ it is a Banach space.
Only L2 (S; Rn ) Rcan be a Hilbert space, with inner product
hf , g iL2 (S;Rn ) , S fg dµ.
Lp (S) , Lp (S; R).
Theorem
If F (x0 ) < +∞ and inf F > −∞, then up to a subsequence τj → 0, both
x̃ τj and x τj converge uniformly to a same curve x ∈ H 1 (Rn ) and v τj
weakly converges in L2 (R; Rn ) to a vector function v s.t. x 0 = v and
1) v (t) ∈ ∂F (x(t)) a.e., if F is λ-convex;
2) v (t) = −∇F (x(t)), ∀t, if F is C 1 .
Details:
2 Weak convergence in a Hilbert space H:
xn ∈ H, n ≥ 1, x ∈ H, xn * x is defined as:
∀f ∈ H0 , f (xn ) → f (x).
⇐⇒
∀y ∈ H, hxn , y iH → hx, y iH .
xn → x =⇒ xn * x.
xn * x, kxn k → kxk =⇒ xn → x.
If dim(H) ≤ ∞, xn * x ⇐⇒ xn → x.
Theorem
If F (x0 ) < +∞ and inf F > −∞, then up to a subsequence τj → 0, both
x̃ τj and x τj converge uniformly to a same curve x ∈ H 1 (Rn ) and v τj
weakly converges in L2 (R; Rn ) to a vector function v s.t. x 0 = v and
1) v (t) ∈ ∂F (x(t)) a.e., if F is λ-convex;
2) v (t) = −∇F (x(t)), ∀t, if F is C 1 .
Details:
3 H k (Ω) space (Ω ⊂ Rn )
Weak derivative. For u ∈ C k (Ω) and φ ∈ Cc∞ (Ω) (·c for compact
support),
Z Z
uD α φdx = (−1)|α| φD α udx, (Integral by parts)
Ω Ω
Pn
where D α = ∂xα11 · · · ∂xαnn , and |α| = i=1 αi is fixed as k. So define the
weak α-th partial derivative of u as v :
Z Z
uD α φdx = (−1)|α| φv dx, ∀φ ∈ Cc∞ (Ω).
Ω Ω
If it exists, it is uniquely defined a.e.
Chang Liu (THU) Gradient Flow April 24, 2017 18 / 91
Gradient flow in the Euclidean space Approximating Curves
3 H k (Ω) space (Ω ⊂ Rn )
Weak derivative. For u ∈ C k (Ω) and φ ∈ Cc∞ (Ω) (·c for compact
support),
Z Z
uD α φdx = (−1)|α| φD α udx, (Integral by parts)
Ω Ω
Pn
where D α = ∂xα11 · · · ∂xαnn , and |α| = i=1 αi is fixed as k. So define the
weak α-th partial derivative of u as v :
Z Z
|α|
α
uD φdx = (−1) φv dx, ∀φ ∈ Cc∞ (Ω).
Ω Ω
If it exists, it is uniquely defined a.e.
Sobolev space W k,p (Ω) for k ∈ N and p ∈ [1, ∞]:
W k,p (Ω) = {u ∈ Lp (Ω) : D α u ∈ Lp (Ω), ∀|α| ≤ k},
with norm:
( P 1/p
kukW k,p (Ω) = kD α ukpLp (Ω)
|α|≤k , 1 ≤ p < +∞, .
max|α|≤k kD α ukL∞ (Ω) , p = +∞.
W k,p (Ω) is a Banach space.
H k (Ω) , W k,2 (Ω). They are Hilbert spaces.
Chang Liu (THU) Gradient Flow April 24, 2017 19 / 91
Gradient flow in the Euclidean space Approximating Curves
Theorem
If F (x0 ) < +∞ and inf F > −∞, then up to a subsequence τj → 0, both
x̃ τj and x τj converge uniformly to a same curve x ∈ H 1 (Rn ) and v τj
weakly converges in L2 (R; Rn ) to a vector function v s.t. x 0 = v and
1) v (t) ∈ ∂F (x(t)) a.e., if F is λ-convex;
2) v (t) = −∇F (x(t)), ∀t, if F is C 1 .
Details:
4 Up to a subsequence
There exists a sequence τj → 0 s.t. x̃ τj and x τj uniformly converge and
v τj weakly converge.
Theorem
If F (x0 ) < +∞ and inf F > −∞, then up to a subsequence τj → 0, both
x̃ τj and x τj converge uniformly to a same curve x ∈ H 1 (Rn ) and v τj
weakly converges in L2 (R; Rn ) to a vector function v s.t. x 0 = v and
1) v (t) ∈ ∂F (x(t)) a.e., if F is λ-convex;
2) v (t) = −∇F (x(t)), ∀t, if F is C 1 .
Proof sketch:
τ τ
|xk+1 −xk |2
2τ ≤ F (xkτ ) − F (xk+1
τ )
P` |xk+1 τ −x τ |2
≤ F (x0τ ) − F (x`+1
τ )
=⇒ k=0 2τ
k
≤ C for F (x0 ) < +∞ and
inf FR> −∞
T
=⇒ 0 12 |(x̃ τ )0 (t)|2 dt ≤ C
=⇒ x̃ τ is bounded in H 1 and v τ in L2 , and the injection H 1 ⊂ C 0,1/2 gives
an equicontinuity bound on x̃ τ of the form |x̃ τ (t) − x̃ τ (s)| ≤ C |t − s|1/2
=⇒ According to the AA theorem, x τ has a uniformly converging
subsequence.
Characterizing Properties
Motivation
x 0 = −∇F (x) (or x 0 ∈ −∂F (x)) is hard to generalize to metric space!
There is nothing but distance in metric space, so ∇F (x) or ∂F (x)
cannot be defined! (different from manifold)
Use two properties of gradient flow that can characterize it and can
be generalized to metric space.
d 1 λ
|x(t) − y |2 ≤ F (y ) − F (x(t)) − |x(t) − y |2 , ∀y ∈ X
dt 2 2
is equivalent to x 0 (t) ∈ −∂F (x(t)).
Sometimes also denoted as EVIλ .
It is important for establishing the uniqueness and stability of gradient
flow.
AC ⇒ Lipschitz
AC ⇒ Metric derivative exists a.e.
R1
If ω ∈ AC(X ), Length(ω) = 0 |ω 0 |(t)dt.
t(1 − t) 2
F (x(t)) ≤ (1 − t)F (x(0)) + tF (x(1)) − λ d (x(0), x(1)).
2
|F (x) − F (y )|
|∇F |(x) = lim sup .
y →x d(x, y )
[F (x) − F (y )]+
|∇− F |(x) = lim sup .
y →x d(x, y )
d 1 λ
d(x(t), y )2 ≤ F (y ) − F (x(t)) − d(x(t), y )2 , ∀y ∈ X .
dt 2 2
EVI-GF ⇒ EDE-GF
Uniqueness and contractivity: for two EVI-GFs x(t) and y (s),
d 1 λ
d(x(t), y (s))2 ≤ F (y (s)) − F (x(t)) − d(x(t)), y (s))2 ,
dt 2 2
d 1 λ
d(x(t), y (s))2 ≤ −F (y (s)) + F (x(t)) − d(x(t)), y (s))2 .
ds 2 2
1 2 d
Define E (t) = 2 d(x(t), y (t)) , then dt E (t) ≤ −2λE (t)
⇒ d(x(t), y (t)) ≤ e −λt d(x(0), y (0)), which gives uniqueness for a
given initial condition and exponential convergence for λ > 0.
Chang Liu (THU) Gradient Flow April 24, 2017 35 / 91
Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces
d 1 λ
d(x(t), y )2 ≤ F (y ) − F (x(t)) − d(x(t), y )2 , ∀y ∈ X .
dt 2 2
A strong condition; existence is hard to guarantee.
A sufficient condition for the existence: Compatible Convexity along
Generalized Geodesics (C2 G2 ):
∀x0 , x1 ∈ X , ∀y ∈ X , ∃x : [0, 1] → X s.t. x(0) = x0 , x(1) = x1 and
t(1 − t) 2
F (x(t)) ≤ (1 − t)F (x0 ) + tF (x1 ) − λ d (x0 , x1 ),
2
d 2 (x(t), y ) ≤ (1 − t)d 2 (x0 , y ) + td 2 (x1 , y ) − t(1 − t)d 2 (x0 , x1 ),
i.e. λ-convexity of F and 2-convexity of x 7→ d 2 (x, y ) along a same
curve (not necessarily geodesic).
Chang Liu (THU) Gradient Flow April 24, 2017 36 / 91
Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces
τ d(x, xkτ )2
xk+1 ∈ arg min F (x) + .
x 2τ
Define two kinds of interpolations in a similar way:
1) Define x τ (t) = xkτ , t ∈ (kτ, (k + 1)τ ];
2) Define x̃ τ (t), t ∈ (kτ, (k + 1)τ ] to be the constant-speed geodesic
between xkτ and xk+1 τ .
Definition
Constant-speed geodesic: in a length space, a curve ω : [t0 , t1 ] → X s.t.
|t − s|
d(ω(t), ω(s)) = d(ω(t0 ), ω(t1 )), ∀t, s ∈ [t0 , t1 ].
t1 − t0
Constant-speed
R t1geodesics are geodesics:
d(ω(t0 ),ω(t1 ))
Length(ω) = t0 t1 −t0 dt = d(ω(t0 ), ω(t1 )).
The followings are equivalent:
1 ω : [t0 , t1 ] → X is a constant-speed geodesic joining x0 and x1 ;
2 ω ∈ AC(X ) and |ω 0 |(t) = d(ω(tt10−t
),ω(t1 ))
0
a.e.;
R t1 0 p
3 ω ∈ arg min{ t0 |ω |(t) dt : ω(t0 ) = x0 , ω(t1 ) = x1 }, ∀p > 1.
Chang Liu (THU) Gradient Flow April 24, 2017 38 / 91
Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces
|v τ |(t) = d(xk+1
τ
, xkτ )/τ, t ∈ (kτ, (k + 1)τ ].
Definition (MMS-GF)
Let (X , d) be a metric space (not necessarily length space). A curve
x : [0, T ] → X is called Generalized Minimizing Movements (GMM) (I
would call it MMS-GF) if there exists a sequence τj → 0 s.t. x τj uniformly
converges to x in (X , d).
Existence analysis:
Condition for the existence of xkτ :
The sub-level set {x : F (x) ≤ c} is compact in X , and F is Lipschitz.
(The corresponding topology is either the one induced by d, or a weaker topology s.t. d is
lower semi-continuous w.r.t. it.)
Condition for the existence of limit curves (i.e. MMS-GF):
Existence of xkτ is enough!
τ
d(xk+1 ,xkτ )2 √
Due to 2τ
≤ F (xkτ ) − F (xk+1
τ ), we have d(x τ (t), x τ (s)) ≤ C (|t − s|1/2 + τ ),
i.e. {x τ }τ are equi-Hölder continuous with exponent 1/2 (up to a negligible error of order
√
τ ). So by AA theorem, the set {x τ }τ has uniformly converging subsequences, i.e.
MMS-GF. But not unique and no relation with F (EDE or EVI) is obtained.
Chang Liu (THU) Gradient Flow April 24, 2017 40 / 91
Gradient Flow in Metric Spaces Generalization of Gradient Flow to Metric Spaces
Settings
Let X , Y be two measurable spaces, µ ∈ P(X ) and ν ∈ P(Y ) are
fixed measures, Let c : X × Y → R be a cost function.
Definition (push-forward of a measure)
For a measurable function T : X → Y and a measure µ ∈ P(X ), define
the push-forward of µ under T , T# µ, to be a measure on Y s.t.
Example
For X = Y = Rn and T invertible, then in terms of p.d.f.,
T# µ = (µ ◦ T −1 )|det(∇T −1 )|, i.e. rule of change of variables.
Reformulation:
Definition
c-transform (c-conjugate) of χ : X → R̄, χc : Y → R̄, is defined as
χc (y ) , inf x∈X c(x, y ) − χ(x).
Ψc (X ) , {χc |χ : X → R̄}. ψ : Y → R̄ is c-concave if ψ ∈ Ψc (X ).
Z Z
0
(DKP ) sup φdµ + φc dν.
φ∈Ψc (X ) X Y
Remark
Strong duality holds: KP(µ, ν) = DKP(µ, ν).
Theorem
For quadratic cost and Ω ⊂ Rn close, bounded and connected, ∃! optimal
transport plan γ ∗ for (KP).
Additionally, if µ is absolutely continuous, optimal transport map T ∗ exists and
γ ∗ = (id, T ∗ )# µ. Moreover, there exists a Kantorovich potential ϕ s.t. ∇ϕ is
2
unique µ-a.e, and T = ∇u a.e., where u(x) , x2 − φ(x) is convex.
Theorem
For quadratic cost and Ω ⊂ Rn close, bounded and connected, ∃! optimal
transport plan γ ∗ for (KP).
Additionally, if µ is absolutely continuous, optimal transport map T ∗ exists and
γ ∗ = (id, T ∗ )# µ. Moreover, there exists a Kantorovich potential ϕ s.t. ∇ϕ is
2
unique µ-a.e, and T = ∇u a.e. where u(x) , x2 − φ(x) is convex.
Corollary
Under the same condition, any gradient of a convex function is an optimal map
between µ and its image measure.
Optimal transport map uniquely exists for c(x, y ) = h(x − y ) with h strictly
convex. (e.g. |x − y |p , p > 1).
Definition
On metricR space (X , d), for p ≥ 1 and a fixed point x0 ∈ X , define
mp (µ) , X d(x, x0 )p dµ(x), and Pp (X ) , {µ ∈ P(X ) : mp (µ) < +∞},
which is independent of the choice of x0 .
Theorem
1/p
d(x, y )p dγ(x, y )
R
Wp (µ, ν) , inf γ∈Π(µ,ν) X is a distance on Pp (X )
Theorem
In Wp (X ) with p ≥ 1, given µ, µn ∈ Pp (X ), n ∈ N, the followings are
equivalent:
µn → µ w.r.t. Wp ;
µn * µ and mp (µn ) → mp (µ);
R R
X φdµn0 → X φdµ, ∀φ ∈
φ ∈ C (X ) : ∃A, B ∈ R s.t. |φ(x)| ≤ A + Bd(x, x0 )p , ∀x, x0 ∈ X .
Theorem
λ-convexity on Ω of V (or W ) =⇒ λ-geodesic convexity on W2 (Ω) of V (or
W).
f (0) = 0 and s d f (s −d ) is convex and decreasing, Ω is convex, 1 < p < ∞
=⇒ F is geodesically convex in W2 (Ω).
Chang Liu (THU) q
Gradient Flow April 24, 2017 56 / 91
Gradient Flows on Wasserstein Spaces Gradient Flows on W2 (Ω), Ω ⊂ Rn
Curves/flows on Wp (Ω), Ω ⊂ Rn
Continuity equation:
What is special for Wp (Ω), is that it is of probability distributions. The
curve/flow/dynamics in Wp (Ω), µt , represents the evolution of
distributions. This evolution can be associated with (viewed as a result of)
an evolution/dynamics in Rn , represented by vector field vt . The typical
relation between them is the continuity equation:
∂t µt + ∇ · (vt µt ) = 0.
Curves/flows on Wp (Ω), Ω ⊂ Rn
Theorem
Let p > 1, Ω ⊂ Rd open, bounded and connected.
Let {µt }t∈[0,1] be an AC curve in Wp (Ω). Then for a.e. t ∈ [0, 1]
there exists a vector field vt ∈ Lp (µt ; Rd ) s.t. 1)
∂t µt + ∇ · (vt µt ) = 0 is satisfied in the sense of distributions; 2) for
a.e. t ∈ [0, 1], kvt kLp (µt ) ≤ |µ0 |(t).
Conversely, if {µt }t∈[0,1] ⊂ Pp (Ω) and ∀t we have a vector field
R1
vt ∈ Lp (µt ; Rd ) with 0 kvt kLp (µt ) dt < +∞ solving
∂t µt + ∇ · (vt µt ) = 0, then {µt }t∈[0,1] is AC in W(Ω) and for a.e.
t ∈ [0, 1], |µ0 |(t) ≤ kvt kLp (µt ) .
Thus in both cases, the conclusion can be strengthened with
|µ0 |(t) = kvt kLp (µt ) .
(I guess vti : Ω → R, 1 ≤ i ≤ d satisfies |vti |p is µt -integrable, and
P R 1/p
d i (x)|p dµ (x)
kvt kLp (µt ) = i=1 Ω |v t t .)
Chang Liu (THU) Gradient Flow April 24, 2017 59 / 91
Gradient Flows on Wasserstein Spaces Gradient Flows on W2 (Ω), Ω ⊂ Rn
Theorem
δF
δρ = f 0 (ρ), δV δW
δρ = V , δρ = W ∗ ρ (convolution)
JKO: Jordan-Kinderleherer-Otto.
Solve the problem of the form min{F (ρ) + 12 W22 (ρ, ν) : ρ ∈ P(Ω)} (τ
is included in F .)
Two recent methods:
1) based on the Benamou-Brenier formula, for convex F (ρ);
2) based on methods from semi-discrete optimal transport, for
geodesically convex F . (involving techniques in computing geometry;
not covered in this slide)
Benamou-Brenier formula
Benamou-Brenier formula
From this theorem, we can see:
For the cost c(x, y ) = |x − y |p , find an optimal transport ⇐⇒ find
constant-speed geodesic in Wp , since they are closely related and
(when p > 1 and µ absolutely continuous) they are one-to-one.
R1
Find constant-speed geodesic: minµt 0 |µ0 |(t)p dt.
1/p
In Wp , we have |µ0 |(t) = kvt kLp (µt ) = Ω |vt |p dµt
R
, where vt is
the velocity field solving the continuity equation.
So, we get the Benamou-Brenier formula (Time-dependent Kantorovich
Problem):
Z 1Z
(TKP1) min |vt |p dρt dt.
(ρt ,vt ):ρ0 =µ,ρ1 =ν, 0 Ω
∂t ρt +∇·(vt µt )=0
Benamou-Brenier formula
Z 1Z
(TKP1) min |vt |p dρt dt.
(ρt ,vt ):ρ0 =µ,ρ1 =ν, 0 Ω
∂t ρt +∇·(vt µt )=0
Benamou-Brenier formula
1Z
|Et |p
Z
(TKP2) min dxdt.
(ρt ,Et ):ρ0 =µ,ρ1 =ν, 0 Ω ρtp−1
∂t ρt +∇·Et =0
Further transformation:
Kq , {(a, b) ∈ R × Rd : a + q1 |b|q ≤ 0} for q = p/(p − 1) conjugate
of p. It is convex in R × Rd .
For t ∈ R and x ∈ Rd , define
|x|p
p1 t p−1 , if t > 0
fp (t, x) , sup (at + b · x) = 0, if t = 0, x = 0
(a,b)∈Kq
+∞, if t = 0, x 6= 0, or t < 0.
So the optimization problem can be reformulated
Z Z as Z Z
(TKP3) min sup adρ + b · dE ,
(ρt ,Et ):ρ0 =µ,ρ1 =ν, (a,b)∈
∂t ρt +∇·Et =0 C (Ω×[0,1];Kq )
RR
where indicates integral w.r.t. both space and time.
Chang Liu (THU) Gradient Flow April 24, 2017 70 / 91
Gradient Flows on Wasserstein Spaces Numerical methods from the JKO scheme
Benamou-Brenier formula
ZZ ZZ
(TKP3) min sup adρ + b · dE ,
(ρt ,Et ):ρ0 =µ,ρ1 =ν, (a,b)∈
∂t ρt +∇·Et =0 C (Ω×[0,1];Kq )
Utilizing
ZZ ZZ Z Z
sup − ∂t φdρ − ∇φ · dE + φ1 dν − φ0 dµ
φ∈C 1 ([0,1]×Ω)
0, if ρ0 = µ, ρ1 = ν, ∂t ρt + ∇ · Et = 0,
= ,
+∞, otherwise.
we get
ZZ ZZ
(TKP4) min sup (a − ∂t φ)dρ + (b − ∇φ) · dE
(ρt ,Et ) (a,b)∈C (Ω×[0,1];Kq ),
φ∈C 1 ([0,1]×Ω)
Z Z
+ φ1 dν − φ0 dµ.
Benamou-Brenier formula
ZZ ZZ
(TKP4) min sup (a − ∂t φ)dρ + (b − ∇φ) · dE
(ρt ,Et ) (a,b)∈C (Ω×[0,1];Kq ),
φ∈C 1 ([0,1]×Ω)
Z Z
+ φ1 dν − φ0 dµ.
R R
To simplify notation, let m =R (ρ, E ), AR= (a, b), m · A = adρ + b · dE ,
∇t,x φ = (∂t φ, ∇φ), G (φ) = φ1 dν − φ0 dµ, IKp (·) be indicator
function, then
(TKP40 ) min sup L(m, (A, φ)) , m · (A − ∇t,x φ) − IKp (A) + G (φ).
m A,φ
Benamou-Brenier formula
(TKP40 ) min sup L(m, (A, φ)) , m · (A − ∇t,x φ) − IKp (A) + G (φ).
m A,φ
Benamou-Brenier formula
r
(TKP5) min sup m · (A − ∇t,x φ) − IKp (A) + G (φ) − kA − ∇t,x φk2 .
m A,φ 2
To solve this,
Optimize φ: minimize a quadratic functional in calculus of variations,
e.g. solving a Poisson equation
Optimize A: a pointwise minimization problem, specifically a
projection on the convex set Kq
Optimize m: gradient descent. m ← m − r (A − ∇t,x φ)
Application
Application
To be continued... :(
My Remarks
My Remarks
Given a functional F (ρ) on W2 (Ω) with Ω ⊂ Rn , if we want to minimize
it, we can find a gradient flow on W2 (Ω) defined by F , which gradually
minimizes F , by:
1 the MMS discretization with step size τ : we get {ρτ } , where
k k
W22 (ρ, ρτk )
ρτk+1 ∈ arg min F (ρ) + .
ρ 2τ
In this case we directly get a sequence of distributions DIRECTLY,
e.g. in terms of pdf.
2 simulating a dynamics/flow on Ω, which is associated with the
gradient flow on W2 (Ω) (or which is the cause/reason of the
evolution of the distribution described by the gradient flow on
W2 (Ω)). The dynamics/flow on Ω is governed by
d δF
ξt (x) = vt (x), vt (x) = −∇ (ρt ) (x).
dt δρ
In this case the distribution is embodied as samples from it. We will
Chang Liu (THU) Gradient Flow April 24, 2017 78 / 91
My Remarks
My Remarks on SVGD
By Gradient Flow
F (ρ) = Ω ρ log pρ dx, δF
R
δρ = log ρ − log p + 1, so:
My Remarks on SVGD
By Variation Calculus
Find the “directional derivative” G (v , ρ) of F (ρ) w.r.t. the dynamics
d
dt ξt (x) = vt (x):
d
G (v , ρ) = F (ρ[ξ(ε) ] )|ε=0 , ξ (ε) (x) = x + εv (x),
dε
−1 −1
ρ[ξ(ε) ] (x) = ρ(ξ (ε) (x))|Jacξ (ε) | ≈ ρ(x − εv (x))|Jac(x − εv (x))|.
For F (ρ) = KL(ρ||p), by my written R notes on SVGD or the electronic
notes on R-SVGD, G (v , ρ) = Ω ρ[∇ log p · v + ∇ · v ]dx.
My Remarks on SVGD
By Variation Calculus
But this result cannot even recover the case of F (ρ) = KL(ρ||p)! Nor can
it deduce the result of Gradient Flow v = −∇( δFδρ ) by
minλ maxv G (v , ρ) + λkv k − λ using variation calculus. Why? I would
prefer that there is something wrong in the above deduction of G (v , ρ).
Chang Liu (THU) Gradient Flow April 24, 2017 82 / 91
Appendix
Appendix
Compactness
Lower semicontinuity
Hölder space
Equicontinuity
Concept δ depends on
Continuity ε, x0 , f
Uniform continuity ε, f
Pointwise equicontinuity ε, x0
Uniform equicontinuity ε