0% found this document useful (0 votes)
89 views

1 Eric Boosting304FinalRpdf

This document discusses the AdaBoost algorithm. AdaBoost is a boosting algorithm that can generate a strong classifier by combining weak learners. It works by focusing on examples that previous weak learners misclassified and adjusting example weights accordingly. AdaBoost is proven to reduce training error and empirically works well due to its ability to increase margins between classes rather than just reducing errors. While simple and effective, it can overfit if weak learners are too complex or fail to produce large margins.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views

1 Eric Boosting304FinalRpdf

This document discusses the AdaBoost algorithm. AdaBoost is a boosting algorithm that can generate a strong classifier by combining weak learners. It works by focusing on examples that previous weak learners misclassified and adjusting example weights accordingly. AdaBoost is proven to reduce training error and empirically works well due to its ability to increase margins between classes rather than just reducing errors. While simple and effective, it can overfit if weak learners are too complex or fail to produce large margins.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

BOOSTING (ADABOOST ALGORITHM)

Eric Emer

Consider Horse-Racing Gambler


Rules of Thumb for determining Win/Loss: Most favored odds Fastest recorded lap time Most wins recently, say, in the past 1 month Hard to determine how he combines analysis of feature

set into a single bet.

Consider MIT Admissions

2-class system (Admit/Deny) Both Quantitative Data and Qualitative Data


We consider (Y/N) answers to be Quantitative (-1,+1) Region, for instance, is qualitative.

Rules of Thumb, Weak Classifiers

Easy to come up with rules of thumb that correctly classify the training data at

better than chance.


E.g. IF GoodAtMath==Y THEN predict Admit. Difficult to find a single, highly accurate prediction rule. This is where our Weak

Learning Algorithm, AdaBoost, helps us.

What is a Weak Learner?


For any distribution, with high probability, given

polynomially many examples and polynomial time we can find a classifier with generalization error better than random guessing.
< 1 2 , also denoted > 0 for generalization error ( 1 2 )

Weak Learning Assumption


We assume that our Weak Learning Algorithm (Weak

Learner) can consistently find weak classifiers (rules of thumb which classify the data correctly at better than 50%) Given this assumption, we can use boosting to generate a single weighted classifier which correctly classifies our training data at 99%-100%.

AdaBoost Specifics
How does AdaBoost weight training examples optimally? Focus on difficult data points. The data points that have been misclassified most by the previous weak classifier. How does AdaBoost combine these weak classifiers into a

comprehensive prediction?
Use an optimally weighted majority vote of weak classifier.

AdaBoost Technical Description

Missing details: How to generate distribution? How to get single classifier?

Constructing Dt
D 1 ( i) = and given Dt and ht : Dt+1 c ( x) = D t ( i) = c ( x) Zt : y i = ht ( x i ) : yi 6= ht (xi )
t yi h t (x i )

1 m

e t e t

Dt+1 = where Zt = normalization constant

D t ( i) e Zt

1 1 t >0 t = ln 2 t

Getting a Single Classifier


X
t

Hf inal (x) = sign(

t ht (x))

Mini-Problem

Training Error Analysis

Claim: then,

Proof
Step 1: unwrapping the recurrence

training error(Hf inal ) p Step 3: Show Zt = 2 t (1 t )


Step 2: Show

Zt

How might test error react to AdaBoost?


We expect to encounter:
Occams Razor Overfitting

Empirical results of test error

Test error does not increase even after 1000 rounds. Test error continues to drop after training error reaches zero.

Difference from Expectation: The Margins Explanation


Our training error only measures correctness of

classifications, neglects confidence of classifications. How can we measure confidence of classifications?

Hf inal (x) = sign(f (x)) P t ht t f ( x) = P 2 [ 1, 1] t t margin(x, y ) = yf (x)


Margin(x,y) close to +1 is high confidence, correct. Margin(x,y) close to -1 is high confidence, incorrect. Margin(x,y) close to 0 is low confidence.

Empirical Evidence Supporting Margins Explanation


Hf inal (x) = sign(f (x)) P t ht t f ( x) = P 2 [ 1, 1] t t margin(x, y ) = yf (x)

Cumulative distribution of margins on training examples

Pros/Cons of AdaBoost
Pros
Fast Simple and easy to program No parameters to tune

(except T) No prior knowledge needed about weak learner Provably effective given Weak Learning Assumption versatile

Cons Weak classifiers too complex leads to overfitting. Weak classifiers too weak can lead to low margins, and can also lead to overfitting. From empirical evidence, AdaBoost is particularly vulnerable to uniform noise.

Predicting College Football Results


Training Data: 2009 NCAAF Season Test Data: 2010 NCAAF Season

You might also like