Machine Learning A Bayesian and Optimization Perspective 1st Edition Theodoridis Solutions Manualinstant download
Machine Learning A Bayesian and Optimization Perspective 1st Edition Theodoridis Solutions Manualinstant download
https://testbankfan.com/product/machine-learning-a-bayesian-and-
optimization-perspective-1st-edition-theodoridis-solutions-
manual/
We believe these products will be a great fit for you. Click
the link to download now, or visit testbankfan.com
to discover even more!
https://testbankfan.com/product/understanding-machine-learning-
from-theory-to-algorithms-1st-edition-shwartz-solutions-manual/
https://testbankfan.com/product/optimization-in-operations-
research-1st-edition-rardin-solutions-manual/
https://testbankfan.com/product/management-a-faith-based-
perspective-1st-edition-cafferky-solutions-manual/
https://testbankfan.com/product/law-of-journalism-and-mass-
communication-6th-edition-trager-test-bank/
Abnormal Psychology An Integrative Approach canadian
4th Edition Barlow Test Bank
https://testbankfan.com/product/abnormal-psychology-an-
integrative-approach-canadian-4th-edition-barlow-test-bank/
https://testbankfan.com/product/contemporary-criminal-law-
concepts-cases-and-controversies-4th-edition-lippman-test-bank/
https://testbankfan.com/product/integrated-marketing-
communications-4th-edition-chitty-test-bank/
https://testbankfan.com/product/organizational-behavior-global-
edition-15th-edition-robbins-test-bank/
https://testbankfan.com/product/macroeconomics-for-today-9th-
edition-tucker-test-bank/
Integrated Advertising Promotion and Marketing
Communications 7th Edition Clow Test Bank
https://testbankfan.com/product/integrated-advertising-promotion-
and-marketing-communications-7th-edition-clow-test-bank/
1
and from the properties of the inner product in a Hilbert space we have
C = {x : kxk ≤ 1}
C = {x : kxk = 1}
2
is a nonconvex one.
Solution: From the definition of a Hilbert space (see Appendix) the norm
is the induced by the inner product norm, i.e.,
1
kxk = (hx, xi) 2 .
a) Let us now consider two points, x1 , x2 ∈ H, such that
kx1 k ≤ 1, kx2 k ≤ 1,
and let
x = λx1 + (1 − λ)x2 , λ ∈ [0, 1].
Then, by the triangle inequality property of a norm
kxk = kλx1 + (1 − λ)x2 k ≤ kλx1 k + k(1 − λ)x2 k,
and since λ ∈ [0, 1]
kxk ≤ λkx1 k + (1 − λ)kx2 k ≤ (λ + 1 − λ)1 ≤ 1.
b) Let two points such that,
kx1 k = 1, kx2 k = 1,
and
x = λx1 + (1 − λ)x2 .
Then we have that
kxk2 = hλx1 + (1 − λ)x2 , λx1 + (1 − λ)x2 i
= λ2 kx1 k2 + (1 − λ)2 kx2 k2 + 2λ(1 − λ)hx1 , x2 i
= λ2 + (1 − λ)2 + 2λ(1 − λ)hx1 , x2 i. (1)
From the Schwartz inequality (Problem 8.1), we have that
|hx1 , x2 i| ≤ kx1 kkx2 k, (2)
or
− 1 ≤ hx1 , x2 i ≤ 1. (3)
From (1) and (3) is readily seen that
kxk2 ≤ 1.
As a matter of fact, the only way for kxk2 = 1, is that
hx1 , x2 i = 1 = kx1 kkx2 k.
However, this is not possible. Equality in (2) is attained if
x1 = ax2 ,
and since kx1 k = kx2 k = 1, this can only happen in the trivial case of
x1 = x2 .
3
b) Assume that
f (y) ≥ f (x) + ∇T f (x)(y − x),
is valid ∀x, y ∈ X, where X is the domain of definition of f . Then, we
have
f (y1 ) ≥ f (x) + ∇T f (x)(y1 − x), (5)
and
f (y2 ) ≥ f (x) + ∇T f (x)(y2 − x). (6)
Combining the previous two inequalities together, we obtain
λf (y1 ) + (1 − λ)f (y2 ) ≥ λf (x) + (1 − λ)f (x) +
λ∇T f (x)(y1 − x) +
(1 − λ)∇T f (x)(y2 − x), (7)
for λ ∈ (0, 1). Since this is true for any x, it will also be true for
x = λy1 + (1 − λ)y2 ,
which results in
f λy1 + (1 − λ)y2 ≤ λf (y1 ) + (1 − λ)f (y2 ), (8)
which proves the claim.
8.4. Show that a function f is convex, iff the one-dimensional function,
g(t) := f (x + ty),
is convex, ∀x, y in the domain of definition of f .
The above is true for y > x. Note that we can also show that
g(t) = f (x + ty),
hence
r = λr1 + (1 − λ)r2 ≥ f (λx1 + (1 − λ)x2 ) (15)
for any
r1 ≥ f (x1 ), r2 ≥ f (x2 ).
Thus (15) is also valid for r1 = f (x1 ), r2 = f (x2 ) and therefore
8.7. Show that if a function is convex, then its lower level set is convex for any ξ.
Solution: Let the function f be convex and two points, x, y, which lie in
the lev≤ξ (f ). Then,
f (x) ≤ ξ, f (y) ≤ ξ.
Hence, by the definition of convexity,
holds true.
or
and
f (y∗ ) < f (x∗ ).
Let
ǫ
λ := .
2||y∗ − x∗ ||
Then,
λ(y∗ − x∗ ) ∈ B[0, ǫ].
7
Hence,
f x∗ + λ(y∗ − x∗ ) ≤ (1 − λ)f (x∗ ) + λf (y∗ ) < f (x∗ ),
ρ := inf kx − yk > 0.
y∈C
kx − xn k < ρn ,
which then defines a sequence, {xn }, of points for which we have that
ρ ≤ kx − xn k < ρn ,
or
ρ ≤ lim kx − xn k < lim ρn = ρ,
n→∞ n→∞
which necessarily leads to
lim kx − xn k = ρ (16)
n→∞
or
1
kxn − xm k2 = 2(kx − xm k2 + kx − xn k2 ) − 4kx − (xn + xm )k2 .
2
8
lim kxm − xn k2 ≤ 0 ⇒
n,m→∞
lim kxm − xn k = 0.
n,m→∞
kx − x∗ k = kx − xn + xn − x∗ k
≤ kx − xn k + kxn − x∗ k.
kx − x∗ k ≤ ρ.
However, since x∗ ∈ C,
kx − x∗ k ≥ ρ,
which means that
kx − x∗ k = ρ,
that is the infimum is attained, which proves the claim. Uniqueness has
been established in the text.
8.12. Show that the projection of a point x ∈ H onto a non-empty closed convex
set, C ⊂ H, lies on the boundary of C.
Sδ := {y : ky − PC (x)k < δ} ⊂ C.
Let
δ x − PC (x)
z := PC (x) + · ,
2 kx − PC (x)k
where by assumption kx − PC (x)k > 0, since x 6∈ C. Obviously z ∈ Sδ .
Hence,
δ
kx − zk = |x − PC (x) 1 − .
2kx − PC (x)k
However, δ can be chosen arbitrarily small, thus choose
δ < kx − PC (x)k.
9
kx − zk < kx − PC (x)k,
which violates the definition of projection. Thus, PC (x) lies onto the
boundary of C.
8.13. Derive the formula for the projection onto a hyperplane in a (real) Hilbert
space, H.
Hence,
0 ≤ khθ, y∗ i + θ0 k2 = lim khθ, y∗ − yn ik2 = 0,
n→∞
or
hθ, y∗ i + θ0 = 0,
which proves the claim.
Let now z ∈ H be the projection of x ∈ H, i.e., z : PC (x). Then by the
definition
z := arg min hx − z, x − zi.
hθ,zi+θ0 =0
For those not familiar with infinite dimensional spaces, it suffices to say
that similar rules of differentiation apply, although the respective defini-
tions are different (more general).
After differentiation of the Lagrangian, we obtain
2z − 2x − λθ = 0
or
1
z=(2x + λθ).
2
Plugging into the constraint, we obtain
hθ, xi + θ0
λ = −2 ,
kθk2
which then results in the solution.
10
8.14. Derive the formula for the projection onto a closed ball, B[0, ρ].
We have already seen (Problem 8.2) that it is convex. Let us show the
closeness. Let
yn ∈ B[0, ρ] → y ∗ .
We have to show that y ∗ ∈ B[0, ρ].
ky ∗ k2 = ky ∗ − yn + yn k2 ≤ kyn k2 + ky ∗ − yn k2
≤ ρ2 + ky ∗ − yn k2 ,
or
ky ∗ k2 ≤ ρ2 + lim ky ∗ − yn k2 ,
n→∞
or
ky ∗ k2 ≤ ρ2 ,
which proves the claim.
To derive the projection, we follow similar steps as in Problem 8.13, by
replacing the constraint by
kzk2 = ρ2 ,
since the projection is on the boundary. Taking the gradient of the La-
grangian we get
2(z − x) + 2λz = 0,
or
1
z= x.
1+λ
Plugging into the constraint, we get
1
|1 + λ| = kxk.
ρ
ρ
z= x,
kxk
ρ
z=− x.
kxk
11
From the two possible vectors, we have to keep the one that has the smaller
distance from x. However,
ρ ρ
x− x < x+ x ,
kxk kxk
ρ
since kxk < 1. Thus,
1
1+λ= kxk,
ρ
and the projection is equal to
ρ
kxk x, ||x|| > ρ
PB[0,ρ] (x) =
x otherwise.
8.15. Find an example of a point whose projection on the ℓ1 ball is not unique.
or
kx − yk1 ≥ 2 − kyk1 ≥ 1.
That is, the ℓ1 norm of x from any point in the set S1 [0, 1] is bounded
below by 1. Consider the two points
kx − y1 k = kx − y2 k = 1.
Moreover, one can easily check out that all points on the line segment
y : y1 + y2 = 1
or
λ
Real{hx − PC (x), y − PC (x)i} ≤ ky − PC (x)k2 .
2
Taking the limit λ → 0, we prove the first property.
To prove the second property, since PC (y) ∈ C, we apply the previous
property with PC (y) in place of y, i.e.,
Similarly
Real {hy − PC (y), PC (x) − PC (y)i} ≤ 0.
After adding the above inequalities together and rearranging the terms we
obtain the second property.
8.17. Prove that if S is a closed subspace S ⊂ H in a Hilbert space H, then
∀x, y ∈ H,
hx, PS (y)i = hPS (x), yi = hPS (x), PS (y)i.
and
PS (ax + by) = aPS (x) + bPS (y).
since
x − PS (x) ⊥ PS (y)
13
Hence
hx, PS (y)i = hPS (x), yi.
For the linearity, we have
x = PS (x) + (x − PS (x)),
y = PS (y) + (y − PS (y)),
and since the term in the second parenthesis on the right hand side lies in
S ⊥ we readily obtain that
Solution:
a) We will first prove that S ⊥ is a subspace. Indeed if x1 ∈ S ⊥ and
x2 ∈ S ⊥ then
hx1 , yi = hx2 , yi = 0, ∀y ∈ S.
or
hax1 + bx2 , yi = 0 ⇒ ax1 + bx2 ∈ S ⊥ .
Also, 0 ∈ S ⊥ since hx, 0i = 0. Hence S ⊥ is a subspace.
We will prove that S ⊥ is also closed. Let {xn } ∈ S ⊥ and
lim xn = x∗ .
n→∞
hxn , yi = 0, ∀y ∈ S.
Moreover,
where the Cauchy-Schwartz inequality has been used. The last inequality
leads to
hx∗ , yi = 0 ⇒ x∗ ∈ S ⊥ .
b) Let x ∈ S ∩ S ⊥ . By definition, since it belongs to both subspaces,
hx, xi = 0 ⇒ x = 0,
x = PS (x) + (x − PS (x)).
We will first show that x − PS (x) ∈ S ⊥ . Then we will show that this
decomposition is unique. We already know that
or
aReal{hx − PS (x), yi} ≤ Real{hx − PS (x), PS (y)i},
which can only be true if
Recall that if c ∈ C
Imag{c} = Real{−jc},
Hence,
Thus,
hx − PS (x), yi} = 0, ∀y ∈ S,
and
x − PS (x) ∈ S ⊥ .
Thus
x = x1 + x2 , x1 = PS (x) ∈ S, x2 = x − PS (x) ∈ S ⊥ .
x = x3 + x4 , x3 ∈ S, x4 ∈ S ⊥ .
15
Then
x1 + x2 = x3 + x4
or
S → x1 − x3 = x4 − x2 ∈ S ⊥ ,
which necessarily implies that they are equal to the single point comprising
S ∩ S ⊥ , i.e.,
x1 − x3 = 0 = x4 − x2 ,
hence the decomposition is unique and we have proved the claim.
Let us elaborate a bit more. we will show that,
PS ⊥ (x) = x − PS (x).
x = PS (x) + (x − PS (x)).
or
PS ⊥ (x) = 0 + PS ⊥ (x − PS (x))
= x − PS (x),
PS ⊥ (y) = 0.
Indeed,
ky − 0k2 = kyk2 < ky − ak2 , ∀a 6= 0 ∈ S ⊥
since
To derive the bounds we used that 1 − µ < 0 and 2 − µ > 0, for µ ∈ (1, 2).
8.20. Show that the relaxed projection operator is a strongly attractive map-
ping.
µ(2 − µ)
kTC (x) − yk2 ≤ kx − yk2 − kTC (x) − xk2
µ2
(2 − µ)
= kx − yk2 − kTC (x) − xk2 ,
µ
where we used that
1
PC (x) − x = (TC (x) − x).
µ
xn = {0, 0, . . . , 1, 0, 0, . . .} := {δni }, i = 0, 1, 2, . . .
17
That is, each point, xn is itself a sequence, with zeros everywhere, except
at time index, n, where it is 1. For every point (sequence) y ∈ l2 , we have
that
X ∞ ∞
X
kyk2 := |yn |2 = |hxn , yi|2 < ∞,
n=1 n=1
8.22. Prove that if C1 . . . CK are closed convex sets in a Hilbert space H, then
the operator
T = TCK · · · TC1 ,
is a regular one; that is,
Solution:
Fact 1:
T = TCK TCK−1 · · · TC1 := TK · · · T1
is a non-expansive mapping.
Indeed, ∀x, y ∈ H
Fact 2:
K
\
Fix(T ) = Ck := C.
k=1
Indeed, if x ∈ C then
TK (x) = x.
Then ∀y ∈ C we have
or
kT1 (x) − yk = kx − yk, ∀y ∈ C.
which can only be true if x ∈ C and hence T1 (x) = x. Note that the
previous two facts are valid for general Hilbert spaces.
or
1
kx − P1 (x)k2 ≤ kx − yk2 − kT1 (x) − yk2 ,
µ1 (2 − µ1 )
and by the definition
we get
µ1
kx − T1 (x)k = µ21 kx − P1 (x)k2 ≤ kx − yk2 − kT1 (x) − yk2 .
2 − µ1
Let now
or
2µ1
kx − T2 T1 (x)k2 ≤ (kx − yk2 − kT1 (x) − yk2 )
2 − µ1
2µ2
+ (kT1 (x) − yk2 − kT2 T1 (x) − yk2 ).
2 − µ2
Let
2µ1 2µ2
b2 = max , .
2 − µ1 2 − µ2
Then obviously, we can write
where
T12 (x) = T2 T1 (x) .
Following a similar rationale and by induction we can show that
where,
T = TK TK−1 · · · T1 ,
and
µk
bK = max .
1≤k≤K 2 − µk
Now by induction,
kT n−1 (x) − T n (x)k2 ≤ bK 2K−1 (kT n−1 (x) − y)k2 − kT n (x) − yk (19)
Summing by parts (17)–(19), we obtain
∞
X
kT n−1(x) − T n (x)k2 ≤ bK 2K−1 kx − yk2 < +∞.
n=1
Hence,
lim kT n−1 (x) − T n (x)k = 0.
n−→∞
Note that till now, everything is valid for general Hilbert spaces.
8.23. Show the fundamental POCS theorem for the case of closed subspaces in
a Hilbert space, H.
Fact 3:
S := {y : y = (I − T )(z), ∀z ∈ H}
S = C ⊥.
The proof that S is a subspace is trivial, from the linearity of T . Also, let
x ∈ S ⊥ . Then, by the respective definition,
Hence,
(I − T ∗ )(x) = x − T ∗ (x) = 0,
or
T ∗ (x) = x,
and since T ∗ and T have the same fixed point set (the proof trivial),
S ⊥ ⊆ C.
21
since
T ∗ (x) = x,
which proves that
S ⊥ = C.
Note that what we have said so far is a generalization of Problem 8.18.
We are now ready to establish strong convergence.
The repeated application of T on any x ∈ H leads to T n (x) = (T T T )n(x).
We know that ∀x ∈ H there is a unique decomposition into two orthogonal
complement (closed) subspaces, i.e.,
x = y + z, y ∈ C and z ∈ C ⊥ , ∀x ∈ H
Thus,
kT n(x) − PC (x)k −→ 0.
which proves the claim.
8.24. Derive the subdifferential of the metric distance function dC (x), where C
is a closed convex set C ⊆ Rl and x ∈ Rl .
or
dC (y) − dC (x) ≤ ky − xk.
Hence,
g T (y − x) ≤ ky − xk.
Since this is true ∀y, let
or
g ∈ B[0, 1].
/ C and g any subgradient. For any y ∈ Rl ,
a) Let x ∈
However,
dC (y + PC (x)) ≤ kyk.
Set y = 0. Then,
or
g T (x − PC (x)) ≥ kx − PC (x)k.
However,
kgk ≤ 1,
and recalling the Cauchy-Schwartz inequality, we obtain
and
g T (x − PC (x)) = kx − PC (x)k,
23
g T (y − x) ≤ dC (y) − dC (x),
g T (x − ε(z − x) − x) ≤ 0,
since x − ε(z − x) ∈ C and the condition (20) has been used. Thus,
g T (z − x) ≤ 0, ∀z ∈ Rl .
g = 0.
||θ (i) − θ∗ ||2 ≤ ||θ (i−1) − θ∗ ||2 − 2µi J(θ (i−1) ) − J(θ∗ ) +
Taking into account the bound of the subgradient and the fact that the
left hand side of the inequality is a non-negative number, we obtain
i
X i
X
µk J(θ (k−1) ) − J(θ∗ ) ≤ ||θ (0) − θ∗ ||2 + µ2k G2 .
2 (21)
k=1 k=1
24
Employing the previous bound in (21), the claim is readily obtained, i.e.,
Pi
(i) ||θ (0) − θ∗ ||2 µ2k 2
J∗ − J(θ) ≤ Pi + Pk=1 G .
2 k=1 µk 2 ik=1 µk
Then, following the same arguments as the ones adopted in Problem 8.25,
we get
i
X
||z (i) − θ∗ ||2 ≤ ||θ (0) − θ∗ ||2 − 2 µk J(θ (k−1) ) − J(θ∗ ) +
k=1
i
X
µ2k ||J ′ (θ (k−1) )||2 . (24)
k=1
||θ (i) − θ∗ ||2 = ||PC (z (i) ) − PC (θ∗ )||2 ≤ ||z (i) − θ∗ ||2 . (25)
Combining the last two formulas the proof proceeds as in Problem 8.25.
25
Hence,
!n−1 "
1 X
Jn (θ) = L(yk , xk , θ) + L(yn , xn , θ)
n
k=1
n−1 1
= Jn−1 (θ) + L(yn , xn , θ),
n n
or
n−1 1
∇Jn (θ) = ∇Jn−1 (θ) + ∇L(yn , xn , θ).
n n
Hence
1
∇Jn (θ∗ (n − 1)) = 0 + ∇L(yn , xn , θ∗ (n − 1)).
n
Expanding the left hand side to a first order Taylor approximation we get,
∇Jn (θ∗ (n)) = 0 = ∇Jn (θ∗ (n−1))+∇2 Jn (θ∗ (n−1)) (θ∗ (n) − θ∗ (n − 1)) ,
or
∇Jn (θ∗ (n − 1)) = ∇2 Jn (θ∗ (n − 1)) (θ∗ (n) − θ∗ (n − 1)) ,
which finally proves the claim.
8.29. Consider the online version of PDMb in (8.64), i.e.,
(
n−1 )
PC θn−1 − µn ||JJ(θ ′ n−1
′ (θ n−1 )||2 J (θ ) , If J ′ (θ n−1 ) 6= 0,
θn = (26)
PC (θ n−1 ), If J ′ (θ n−1 ) = 0,
where we have assumed that J∗ = 0. If this is not the case, a shift
can accommodate for the difference. Thus we assume that we know the
minimum. For example, this is the case for a number tasks, such as the
hinge loss function, assuming linearly separable classes, or the linear ǫ-
insensitive loss function, for bounded noise. Assume that
n
X ωk dCk (θn−1 )
Ln (θ) = Pn dCk (θ)
k=n−q+1 k=n−q+1 ωk dCk (θn−1 )
La Colonia de Cumaná.
Juan de Grijalva.
14.—Los informes suministrados por Hernández de Córdoba,
determinaron al gobernador de Cuba á preparar otra expedición,
cuyo mando entregó á Juan de Grijalva, capitán que se había
distinguido en la conquista de la isla.
Grijalva salió de Santiago de Cuba en Mayo de 1518. Descubrió la
isla de Cozumel y continuó su viaje por las costas del golfo,
sufriendo de parte de los indios menos daños que su desgraciado
antecesor Hernández de Córdoba.
Desembarcó en una isla, que llamó de los sacrificios, por los restos
humanos que encontró en sus templos, y siguió hasta la de San
Juan de Ulúa, alcanzando á navegar hasta Panuco, y encontrando
por todas partes poblaciones numerosas y tierras cultivadas con
esmero.
Convencido de que todas estas regiones formaban parte de algún
poderoso país, que no era posible invadir y conquistar con tan
escasos recursos, volvió á Cuba Hernández con la esperanza de
reunir fuerzas suficientes para dominar los territorios descubiertos.
Pero la gloriosa conquista y dominación de Méjico, que tales
guerreros habían preparado, estaba reservada, como más adelante
veremos, para Hernán Cortés, brillante personalidad histórica de la
conquista española en América[572].
CUESTIONARIO
Pedrarias de Avila.