0% found this document useful (0 votes)

5 views7 pages

Cs221 Section2 Solutions

Uploaded by

memoko8574

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views7 pages

Cs221 Section2 Solutions

Uploaded by

memoko8574

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

CS221 Problem Workout Solutions

Week 2

Key Takeaways from this Week

The goal of ML is to learn a function f parameterized by w s.t. fw (x) is very close to y.
Each algorithm is a triplet of three design decisions:

1. Hypothesis class – How will I write down my prediction for y as a function of x?

Which parameters w do I need to learn?

2. Loss function – How do I measure how far my prediction is from the real y?

3. Optimization algorithm – What algorithm will I use to minimize my loss function?

Hypothesis class Loss function Optimization algorithm

y∈R Linear regression fw (x) := w · ϕ(x) Squared loss: (fw (x) − y)2 GD or SGD
0-1 loss: 1[fw (x) ̸= y] Cannot use GD, SGD
y ∈ {−1, 1} (Binary) linear classification fw (x) := sign (w · ϕ(x)) Hinge loss: max{1 − (w · ϕ(x))y, 0} GD or SGD

Logistic loss: log 1 + e−(w·ϕ(x))y GD or SGD

Dimension check. Above, w, ϕ(x) ∈ Rd , while y is a scalar.

1
1) Problem 1: Non-linear features

Consider the following two training datasets of (x, y) pairs:

• D1 = {(−1, +1), (0, −1), (1, +1)}.

• D2 = {(−1, −1), (0, +1), (1, −1)}.

Observe that neither dataset is linearly separable if we use ϕ(x) = x, so let’s fix that.
Define a two-dimensional feature function ϕ(x) such that:

• There exists a weight vector w1 that classifies D1 perfectly (meaning that w1 ·

ϕ(x) > 0 if x is labeled +1 and w1 · ϕ(x) < 0 if x is labeled −1); and
• There exists a weight vector w2 that classifies D2 perfectly.

Note that the weight vectors can be different for the two datasets, but the features
ϕ(x) must be the same.

Solution One option is ϕ(x) = [1, x2 ], and using w1 = [−1, 2] and w2 = [1, −2].
Then in D1 :

• For x = −1, w1 · ϕ(x) = [−1, 2] · [1, 1] = 1 > 0

• For x = 0, w1 · ϕ(x) = [−1, 2] · [1, 0] = −1 < 0
• For x = 1, w1 · ϕ(x) = [−1, 2] · [1, 1] = 1 > 0

In D2 :

• For x = −1, w2 · ϕ(x) = [1, −2] · [1, 1] = −1 < 0

• For x = 0, w2 · ϕ(x) = [1, −2] · [1, 0] = 1 > 0
• For x = 1, w2 · ϕ(x) = [1, −2] · [1, 1] = −1 < 0

Note that there are many options that work, so long as -1 and 1 are separated from 0.

Some additional food for thought: Is every dataset linearly separable in some feature
space? In other words, given pairs (x1 , y1 ), . . . , (xn , yn ), can we find a feature extractor
ϕ such that we can perfectly classify (ϕ(x1 ), y1 ), . . . , (ϕ(xn ), yn ) for some linear model
w? If so, is this a good feature extractor to use?

Solution In theory, yes we can. If we assume that our inputs x1 , . . . , xn are distinct,
then we can construct a feature map ϕ : xi 7→ yi for i = 1, . . . , n. By setting w⋆ = [1],
it’s clear that

yi w⋆ · ϕ(xi ) = yi ∗ yi = 1 > 0, i = 1, . . . , n, (1)

2
so w⋆ correctly classifies all the points in the dataset.
Hopefully, it’s clear that this is a poor choice of feature map. For one, this feature ex-
tractor is undefined for any points outside of the training set! But even more broadly,
this process is not at all generalizeable. We are essentially just memorizing our dataset
instead of learning patterns and structures within the data that will allow us to accu-
rately predict new points in the future. While minimizing training loss is an important
part of the machine learning process (the aforementioned procedure gives you zero
training loss!), it does not guarantee you good performance in the future.

3
2) Problem 2: Backpropagation

Consider the following function

Loss(x, y, z, w) = 2(xy + max{w, z})

Run the backpropagation algorithm to compute the four gradients (each with respect
to one of the individual variables) at x = 3, y = −4, z = 2 and w = −1. Use the
following nodes: addition, multiplication, max, multiplication by a constant.

Solution When calculating the gradients, we run backpropagation from the root
node to the leaves nodes. As shown on the computation graph below, the purple
values are the gradients of Loss with respect to each node. The yellow values are the
computed values of each term for the forward pass. The green values are the partial
derivative of the loss with respect to that node.

4
3) Problem 3: K-means

Consider doing ordinary K-means clustering with K = 2 clusters on the following

set of 3 one-dimensional points:

{−2, 0, 10}. (2)

Recall that K-means can get stuck in local optima. Describe the precise conditions on
the initialization µ1 ∈ R and µ2 ∈ R such that running K-means will yield the global
optimum of the objective function. Notes:

• Assume that µ1 < µ2 .

• Assume that if in step 1 of K-means, no points are assigned to some cluster j,
then in step 2, that centroid µj is set to ∞.
• Hint: try running K-means from various initializations µ1 , µ2 to get some intu-
ition; for example, if we initialize µ1 = 1 and µ2 = 9, then we converge to µ1 = −1
and µ2 = 10.

Solution The objective is minimized for µ1 = −1 and µ2 = 10. First, note that if
all three points end up in one cluster, K-means definitely fails to recover the global
optimum. Therefore, −2 must be assigned to the first cluster, and 10 must be assigned
to the second cluster. 0 can be assigned to either: If 0 is assigned to cluster 1, then
we’re done. If it is assigned to cluster 2, then we have µ1 = −2, µ2 = 5; in the next
iteration, 0 will be assigned to cluster 1 since its closer. Therefore, the condition on
the initialization written formally is | − 2 − µ1 | < | − 2 − µ2 | and |10 − µ1 | > |10 − µ2 |.

5
4) [optional] Problem 4: Non-linear decision boundaries

Suppose we are performing classification where the input points are of the form (x1 , x2 ) ∈
R2 . We can choose any subset of the following set of features:

1 1
2 2
F = x1 , x2 , x1 x2 , x1 , x2 , , , 1, 1[x1 ≥ 0], 1[x2 ≥ 0] (3)
x1 x 2

For each subset of features F ⊆ F, let D(F ) be the set of all decision boundaries
corresponding to linear classifiers that use features F .
For each of the following sets of decision boundaries E, provide the minimal F such
that D(F ) ⊇ E. If no such F exists, write ‘none’.
For example the set of features F = {x21 , x2 } allows the decision boundary of parabolas
opening in the x2 axis, centered at the origin:

• E is all lines [CA hint]:

(4)

• E is all circles centered at the origin:

(5)

• E is all circles:

(6)

• E is all axis-aligned rectangles:

(7)

• E is all axis-aligned rectangles whose lower-right corner is at (0, 0):

(8)

6
Solution

• Lines: x1 , x2 , 1 (ax1 + bx2 + c = 0)

• Circles centered at the origin: x21 , x22 , 1 (x21 + x22 = r2 )
• Circles centered anywhere in the plane: x21 , x22 , x1 , x2 , 1 ((x1 − a)2 + (x2 − b)2 = r2 )
• Axis aligned rectangles: not possible (need features of the form 1[x1 ≤ a])
• Axis aligned rectangles with lower right corner at (0, 0): not possible

ECS7020P Sample Paper Solutions
No ratings yet
ECS7020P Sample Paper Solutions
6 pages
Midterm Review Spring18 Sols
No ratings yet
Midterm Review Spring18 Sols
22 pages
6.86x Machine Learning With Python: Linear Classifiers
No ratings yet
6.86x Machine Learning With Python: Linear Classifiers
7 pages
Cs221 Section2 Problems
No ratings yet
Cs221 Section2 Problems
5 pages
Classification
No ratings yet
Classification
47 pages
Data Science Exam Solutions
No ratings yet
Data Science Exam Solutions
17 pages
SVM Problems1
No ratings yet
SVM Problems1
5 pages
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
No ratings yet
10-701/15-781 Machine Learning Mid-Term Exam Solution: Your Name
12 pages
Midterm 2008s Solution
No ratings yet
Midterm 2008s Solution
12 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
Cs221 LEC 2 Slides
No ratings yet
Cs221 LEC 2 Slides
37 pages
Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
HW 3
No ratings yet
HW 3
7 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
8 pages
Gaussian Processes & Neural Networks Homework
No ratings yet
Gaussian Processes & Neural Networks Homework
4 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
hw3 Soln
No ratings yet
hw3 Soln
7 pages
Introduction To Machine Learning Lecture 3: Linear Classification Methods
No ratings yet
Introduction To Machine Learning Lecture 3: Linear Classification Methods
40 pages
Support Vector Machines: CS229 Lecture Notes
100% (2)
Support Vector Machines: CS229 Lecture Notes
25 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
Final 2012 W
No ratings yet
Final 2012 W
8 pages
Main
No ratings yet
Main
5 pages
ML 2024a QP Solution Full
No ratings yet
ML 2024a QP Solution Full
13 pages
CMU 2018s NinaBALCAN HW3
No ratings yet
CMU 2018s NinaBALCAN HW3
7 pages
Support Vector Machines: CS229 Lecture Notes
No ratings yet
Support Vector Machines: CS229 Lecture Notes
25 pages
Final 2012 Wsolutions
No ratings yet
Final 2012 Wsolutions
14 pages
SVM PDF
No ratings yet
SVM PDF
29 pages
CS273a Final Exam
No ratings yet
CS273a Final Exam
9 pages
Support Vector Machines Lecture Notes
No ratings yet
Support Vector Machines Lecture Notes
25 pages
6390 Fall 2022 Midterm
No ratings yet
6390 Fall 2022 Midterm
20 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
12s 701 Final
No ratings yet
12s 701 Final
17 pages
cs221 Lecture11
No ratings yet
cs221 Lecture11
71 pages
Support Vector Machines: 1 What's SVM
No ratings yet
Support Vector Machines: 1 What's SVM
25 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
Machine Learning 10-701 Exam Prep
No ratings yet
Machine Learning 10-701 Exam Prep
14 pages
Dis11 Sol
No ratings yet
Dis11 Sol
5 pages
ML Assignments 2025
No ratings yet
ML Assignments 2025
91 pages
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 3
No ratings yet
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 3
8 pages
315 F19 14 SVM 1
No ratings yet
315 F19 14 SVM 1
33 pages
DL Assignment Solutions
No ratings yet
DL Assignment Solutions
64 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Linear Classifiers & Perceptron Guide
No ratings yet
Linear Classifiers & Perceptron Guide
5 pages
Machine Learning MCQ Assignment
No ratings yet
Machine Learning MCQ Assignment
56 pages
Bayesian Decision Boundaries
No ratings yet
Bayesian Decision Boundaries
3 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
Final F04soln
No ratings yet
Final F04soln
10 pages
Midterm Solutions For Machine Learning
No ratings yet
Midterm Solutions For Machine Learning
13 pages
189 Cheat Sheet Nominicards PDF
No ratings yet
189 Cheat Sheet Nominicards PDF
2 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Problem 1 Report Trần Minh Long 2052154 Final
No ratings yet
Problem 1 Report Trần Minh Long 2052154 Final
31 pages
Linear Discriminant Functions Guide
No ratings yet
Linear Discriminant Functions Guide
41 pages
Midterm F02soln
No ratings yet
Midterm F02soln
14 pages
L Journal
No ratings yet
L Journal
7 pages
cs221 Section1 Solutions
No ratings yet
cs221 Section1 Solutions
11 pages
6S081 Intro To C Fa21
No ratings yet
6S081 Intro To C Fa21
66 pages
L Organization
No ratings yet
L Organization
6 pages
L Meltdown
No ratings yet
L Meltdown
4 pages
Mogul 96 Usenix
No ratings yet
Mogul 96 Usenix
14 pages
L Net
No ratings yet
L Net
5 pages
L Lockv2
No ratings yet
L Lockv2
4 pages
L Fs
No ratings yet
L Fs
6 pages
Microkernel
No ratings yet
Microkernel
15 pages
cs221 - LEC 6-Slides
No ratings yet
cs221 - LEC 6-Slides
59 pages
L VMM
No ratings yet
L VMM
6 pages
Cs221 LEC 4 Slides
No ratings yet
Cs221 LEC 4 Slides
73 pages
Cs221 LEC 3 Slides
No ratings yet
Cs221 LEC 3 Slides
43 pages
Cs221 Final Review p2
No ratings yet
Cs221 Final Review p2
23 pages
Cs221 LEC 1 Slides
No ratings yet
Cs221 LEC 1 Slides
23 pages
Tutorial Build Your First Machine Learning Model On Azure Databricks
No ratings yet
Tutorial Build Your First Machine Learning Model On Azure Databricks
11 pages
HW 01 - CSL 537
No ratings yet
HW 01 - CSL 537
6 pages
Freshie Roadmap Aiclub
No ratings yet
Freshie Roadmap Aiclub
2 pages
V T P A: A C S: Ision Ransformers in Recision Griculture Omprehensive Urvey
No ratings yet
V T P A: A C S: Ision Ransformers in Recision Griculture Omprehensive Urvey
32 pages
Image Captioning For The Visually Impaired
No ratings yet
Image Captioning For The Visually Impaired
5 pages
Sensors 25 04519
No ratings yet
Sensors 25 04519
23 pages
Spam Email Classifier
No ratings yet
Spam Email Classifier
10 pages
ch1-AI For Every One
No ratings yet
ch1-AI For Every One
7 pages
Report MP
No ratings yet
Report MP
59 pages
Angirekula Prudhvi Ai ML Engineer
No ratings yet
Angirekula Prudhvi Ai ML Engineer
2 pages
Chapter 4 - Introduction To Pattern Recognition &
No ratings yet
Chapter 4 - Introduction To Pattern Recognition &
71 pages
Exercise 9
No ratings yet
Exercise 9
2 pages
Important Questions in Machine Learning
No ratings yet
Important Questions in Machine Learning
4 pages
Chapter 2 MCQs ClassXAI
No ratings yet
Chapter 2 MCQs ClassXAI
26 pages
MCPC2003
No ratings yet
MCPC2003
1 page
Huawei Assignment 1
No ratings yet
Huawei Assignment 1
20 pages
L11 CRF Tagger
No ratings yet
L11 CRF Tagger
8 pages
Back Propagation
No ratings yet
Back Propagation
5 pages
Hyper-Specific Topic Selection & Research Paper Generation Active Learning - Few-Shot Medical Image Segmentation With Graph-Augmented Contrastive Learning
No ratings yet
Hyper-Specific Topic Selection & Research Paper Generation Active Learning - Few-Shot Medical Image Segmentation With Graph-Augmented Contrastive Learning
10 pages
Takeoff Edu Group Matlab Title List
No ratings yet
Takeoff Edu Group Matlab Title List
4 pages
Computational Intelligence: A Methodological Introduction (Texts in Computer Science), 3rd Edition Rudolf Kruse Download
No ratings yet
Computational Intelligence: A Methodological Introduction (Texts in Computer Science), 3rd Edition Rudolf Kruse Download
103 pages
1 s2.0 S1566253525005792 Main
No ratings yet
1 s2.0 S1566253525005792 Main
58 pages
MMC102 - Module 4 - Notes
No ratings yet
MMC102 - Module 4 - Notes
39 pages
AI Engineer (Gen AI)
No ratings yet
AI Engineer (Gen AI)
3 pages
DL Unit1
No ratings yet
DL Unit1
61 pages
Your Roadmap To Becoming A World Class AI Generalist
No ratings yet
Your Roadmap To Becoming A World Class AI Generalist
10 pages
Spam Email Detection Using Python
No ratings yet
Spam Email Detection Using Python
9 pages
PSB Hackathon
No ratings yet
PSB Hackathon
15 pages
From Large AI Models To Agentic AI: A Tutorial On Future Intelligent Communications
No ratings yet
From Large AI Models To Agentic AI: A Tutorial On Future Intelligent Communications
34 pages
Chapter 8-Deep Learning Book (Final Part) - Rev1
No ratings yet
Chapter 8-Deep Learning Book (Final Part) - Rev1
19 pages

Cs221 Section2 Solutions

Uploaded by

Cs221 Section2 Solutions

Uploaded by

CS221 Problem Workout Solutions

Key Takeaways from this Week

1. Hypothesis class – How will I write down my prediction for y as a function of x?

3. Optimization algorithm – What algorithm will I use to minimize my loss function?

Hypothesis class Loss function Optimization algorithm

Logistic loss: log 1 + e−(w·ϕ(x))y GD or SGD

Dimension check. Above, w, ϕ(x) ∈ Rd , while y is a scalar.

Consider the following two training datasets of (x, y) pairs:

• D1 = {(−1, +1), (0, −1), (1, +1)}.

• There exists a weight vector w1 that classifies D1 perfectly (meaning that w1 ·

• For x = −1, w1 · ϕ(x) = [−1, 2] · [1, 1] = 1 > 0

• For x = −1, w2 · ϕ(x) = [1, −2] · [1, 1] = −1 < 0

yi w⋆ · ϕ(xi ) = yi ∗ yi = 1 > 0, i = 1, . . . , n, (1)

Consider the following function

Loss(x, y, z, w) = 2(xy + max{w, z})

Consider doing ordinary K-means clustering with K = 2 clusters on the following

{−2, 0, 10}. (2)

• Assume that µ1 < µ2 .

• E is all lines [CA hint]:

• E is all circles centered at the origin:

• E is all axis-aligned rectangles:

• E is all axis-aligned rectangles whose lower-right corner is at (0, 0):

• Lines: x1 , x2 , 1 (ax1 + bx2 + c = 0)

You might also like