1 Eric Boosting304FinalRpdf
1 Eric Boosting304FinalRpdf
Eric Emer
Easy to come up with rules of thumb that correctly classify the training data at
polynomially many examples and polynomial time we can find a classifier with generalization error better than random guessing.
< 1 2 , also denoted > 0 for generalization error ( 1 2 )
Learner) can consistently find weak classifiers (rules of thumb which classify the data correctly at better than 50%) Given this assumption, we can use boosting to generate a single weighted classifier which correctly classifies our training data at 99%-100%.
AdaBoost Specifics
How does AdaBoost weight training examples optimally? Focus on difficult data points. The data points that have been misclassified most by the previous weak classifier. How does AdaBoost combine these weak classifiers into a
comprehensive prediction?
Use an optimally weighted majority vote of weak classifier.
Constructing Dt
D 1 ( i) = and given Dt and ht : Dt+1 c ( x) = D t ( i) = c ( x) Zt : y i = ht ( x i ) : yi 6= ht (xi )
t yi h t (x i )
1 m
e t e t
D t ( i) e Zt
1 1 t >0 t = ln 2 t
t ht (x))
Mini-Problem
Claim: then,
Proof
Step 1: unwrapping the recurrence
Zt
Test error does not increase even after 1000 rounds. Test error continues to drop after training error reaches zero.
Pros/Cons of AdaBoost
Pros
Fast Simple and easy to program No parameters to tune
(except T) No prior knowledge needed about weak learner Provably effective given Weak Learning Assumption versatile
Cons Weak classifiers too complex leads to overfitting. Weak classifiers too weak can lead to low margins, and can also lead to overfitting. From empirical evidence, AdaBoost is particularly vulnerable to uniform noise.