0% found this document useful (0 votes)
37 views36 pages

QM 7 Panel Regression Fixed Effects

1) Panel data contains observations on multiple entities (such as individuals, firms, states) observed at multiple points in time. This allows researchers to control for omitted variables that do not vary over time but could bias estimates. 2) A differences regression using only two time periods (1982 and 1988 data) can eliminate the impact of any omitted time-invariant variables (such as traffic density). 3) The differences regression estimates the effect of changes in beer taxes (dbtax) on changes in fatality rates (dvfrall) and finds a negative relationship, suggesting higher beer taxes reduce traffic fatalities.

Uploaded by

Darkasy Edits
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views36 pages

QM 7 Panel Regression Fixed Effects

1) Panel data contains observations on multiple entities (such as individuals, firms, states) observed at multiple points in time. This allows researchers to control for omitted variables that do not vary over time but could bias estimates. 2) A differences regression using only two time periods (1982 and 1988 data) can eliminate the impact of any omitted time-invariant variables (such as traffic density). 3) The differences regression estimates the effect of changes in beer taxes (dbtax) on changes in fatality rates (dvfrall) and finds a negative relationship, suggesting higher beer taxes reduce traffic fatalities.

Uploaded by

Darkasy Edits
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

L7.

Panel Regression Analysis


• Topics
• Panel data
• Entity and time fixed effects
1. Entity fixed effects
• Panel data
• contains observations on multiple entities (individuals, firms,
states,…), where each entity is observed at two or more
points in time.

Examples:
• Household expenditure surveys
• DK field experiments (within and between sessions)
• Registry data at Statistics Denmark
Notation for panel data
A double subscript distinguishes entities (e.g., subjects) and time
periods
 
i = entity, n = number of entities,
so i = 1,…,n
 
t = time period, T = number of time periods
so t = 1,…,T
 
Data: Suppose we have 1 regressor. The data are
  (Xi,t; Yi,t), i = 1,…,n, t = 1,…,T
Panel data notation, ctd.
Panel data with k regressors:
 
(X1,i,t; X2,i,t;…;Xk,i,t; Yi,t), i = 1,…,n, t = 1,…,T
 
n = number of entities (e.g., subjects in an experiment)
T = number of time periods (e.g., tasks within or between an
experiment)
 
Some jargon…
• Another term for panel data is longitudinal data
• balanced panel: no missing observations, all variables are
observed for all entities (states) and all time periods (years)
Why are panel data useful?

• With panel data we can control for factors that:


• Vary across entities but do not vary over time
• Could cause omitted variable bias if they are omitted
• Are unobserved or unmeasured, and therefore cannot be
included in the regression using multiple regression
 
• Key idea:
• If an omitted variable does not change over time, then any
changes in Y over time cannot be caused by the omitted
variable.
Panel data in Stata
• Please open Lecture_7.do in Stata
• Data: Fatality.dta
Example: Traffic fatalities

Annual data on traffic fatalities for 48 American states


• 48 states, so n = 48
• 7 years (1982,…, 1988), so T = 7
• Balanced panel, number of observations = 7×48 = 336

Variables:
• Traffic fatality rate (number of traffic deaths per year in a state, per
10,000 state residents)
• Tax on a case of beer
• Other factors (legal driving age, drink driving laws, etc.)
Example: data for 1982
• Linear regression model, 1982 data

. regress vfrall beertax if (year==1982)

Source | SS df MS Number of obs = 48


-------------+---------------------------------- F(1, 46) = 0.62
Model | .279239808 1 .279239808 Prob > F = 0.4347
Residual | 20.6789826 46 .449543101 R-squared = 0.0133
-------------+---------------------------------- Adj R-squared = -0.0081
Total | 20.9582224 47 .445919626 Root MSE = .67048

------------------------------------------------------------------------------
vfrall | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
beertax | .1484603 .1883682 0.79 0.435 -.2307051 .5276258
_cons | 2.010381 .1390785 14.46 0.000 1.730431 2.290332
------------------------------------------------------------------------------
Traffic fatalities in 1982

4
Fatalities per 10,000 residents
2 1 3

0 1 2 3
Beertax

Higher alcohol taxes, more traffic fatalities?


Example: data for 1988
• Linear regression model, 1988 data

. regress vfrall beertax if (year==1988)

Source | SS df MS Number of obs = 48


-------------+---------------------------------- F(1, 46) = 7.12
Model | 1.71077215 1 1.71077215 Prob > F = 0.0105
Residual | 11.0559133 46 .24034594 R-squared = 0.1340
-------------+---------------------------------- Adj R-squared = 0.1152
Total | 12.7666854 47 .271631604 Root MSE = .49025

------------------------------------------------------------------------------
vfrall | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
beertax | .4387546 .1644538 2.67 0.011 .1077262 .769783
_cons | 1.859073 .1059887 17.54 0.000 1.645729 2.072417
------------------------------------------------------------------------------
Traffic fatalities in 1988

3.5 3
Fatalities per 10,000 residents
1.5 2 1 2.5

0 .5 1 1.5 2
Beertax

Higher alcohol taxes, more traffic fatalities?


Example: data for 1982-88
• Linear regression model, 1982-88 data

. regress vfrall beertax

Source | SS df MS Number of obs = 336


-------------+---------------------------------- F(1, 334) = 34.39
Model | 10.1686586 1 10.1686586 Prob > F = 0.0000
Residual | 98.7468513 334 .295649255 R-squared = 0.0934
-------------+---------------------------------- Adj R-squared = 0.0906
Total | 108.91551 335 .325120925 Root MSE = .54374

------------------------------------------------------------------------------
vfrall | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
beertax | .3646054 .0621698 5.86 0.000 .2423117 .4868992
_cons | 1.853308 .0435671 42.54 0.000 1.767607 1.939008
------------------------------------------------------------------------------
Traffic fatalities in 1982-88

4
Fatalities per 10,000 residents
2 1 3

0 1 2 3
Beertax

Higher alcohol taxes, more traffic fatalities?


Omitted factors
• Other factors that determine traffic fatality rates:
• Quality and age of vehicles
• Quality of roads
• Culture around drinking and driving
• Density of vehicles on the road
Omitted variable bias
• Suppose:
• High traffic density means more traffic fatalities
• (Western) states with lower traffic density have lower alcohol
taxes

• Omitted variable bias:


• Then the two conditions for omitted variable bias are
satisfied. Specifically, “high taxes” could reflect “high traffic
density” (so the OLS coefficient would be biased positively –
high taxes, more fatalities)
• Panel data lets us eliminate omitted variable bias when the
omitted variables are constant over time.
Panel data with two time periods
• Consider the panel data model,
 
FatalityRatei,t = β0 + β1BeerTaxi,t + β2Zi + ui,t
 
• Fixed effects:
• Zi does not have a time indicator and does not change over time
(e.g., traffic density or cultural factors)
• Suppose Zi is not observed, so its omission could result in
omitted variable bias.
• The effect of Zi can be eliminated using T = 2 years.
Basic idea
• Fixed effects:
• Any change in the fatality rate from 1982 to 1988 cannot be
caused by Zi, because Zi (by assumption) does not change
between 1982 and 1988.

• Math:
• Consider fatality rates in 1988 and 1982:
FatalityRatei,88 = β0 + β1BeerTaxi,88 + β2Zi + ui,88
FatalityRatei,82 = β0 + β1BeerTaxi,82 + β2Zi + ui,82

• Suppose E(ui,t | BeerTaxi,t, Zi) = 0.


• Subtracting 1988 – 1982 eliminates the effect of Zi
“Difference” regression model
Fri,88 – Fri,82 = β1(BeerTaxi,88 – BeerTaxi,82) + (ui,88 – ui,82)
 
• “Difference” regression model
• The new error term, (ui,88 – ui,82), is uncorrelated with
(BeerTaxi,88 – BeerTaxi,82)
• This “difference” equation can be estimated by OLS
• The omitted variable Zi does not change over time, so it
cannot be a determinant of the change in Y over time
• This differences regression does not have an intercept – it
was eliminated by subtraction
• Suppose we add a constant in the model, how do we interpret
the constant?
Example: data for 1982 and 1988
• Fixed effects model, 1982 and 1988 data, no constant

. regress dvfrall dbtax, noconstant

Source | SS df MS Number of obs = 48


-------------+---------------------------------- F(1, 47) = 4.89
Model | .765652616 1 .765652616 Prob > F = 0.0319
Residual | 7.36082347 47 .156613265 R-squared = 0.0942
-------------+---------------------------------- Adj R-squared = 0.0749
Total | 8.12647609 48 .169301585 Root MSE = .39574

------------------------------------------------------------------------------
dvfrall | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
dbtax | -.8689216 .3929877 -2.21 0.032 -1.659511 -.0783323
------------------------------------------------------------------------------
Example: data for 1982 and 1988
• Fixed effects model, 1982 and 1988 data, with constant

. regress dvfrall dbtax

Source | SS df MS Number of obs = 48


-------------+---------------------------------- F(1, 46) = 6.22
Model | .966449038 1 .966449038 Prob > F = 0.0162
Residual | 7.14175313 46 .155255503 R-squared = 0.1192
-------------+---------------------------------- Adj R-squared = 0.1000
Total | 8.10820217 47 .17251494 Root MSE = .39402

------------------------------------------------------------------------------
dvfrall | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
dbtax | -1.040973 .4172279 -2.49 0.016 -1.880809 -.2011364
_cons | -.0720371 .060644 -1.19 0.241 -.1941072 .050033
------------------------------------------------------------------------------
Change in traffic fatalities in 1982-88

1
Change in fatalities per 10,000 residents
-1 -.5 -1.5 0 .5

-.6 -.4 -.2 0 .2 .4


Change in beertax

Higher alcohol taxes, lower traffic fatalities


Exercises
• Estimate a fixed effects (difference) model of fatality
rates on beertax using data from 1982 and 1983,
including a constant
• What is the marginal effect of beertax on fatality rates?
• Is the marginal effect of beertax on fatality rates
significantly different from 0?
• Do fatality rates tend to increase or decrease over time?
• Is the constant (time trend) significantly different from 0?
2. Fixed effects
• What if T > 2?

Fri,t = βBeerTaxBeerTaxi,t + βStatesStatesi + ui,t


 
• We can write the model in two useful ways:
• Binary regressor form
• Fixed effects regression

• We first write this in “binary regressor” form


• Suppose we have n = 3 states: California, Texas, and
Massachusetts.
Binary regressor form
• Linear regression model with dummy variables for each state:
FRi,t = βBeerTaxBeerTaxi,t + βCACAi + βTXTXi + βMAMAi + ui,t
 
Where
CAi = 1 if state is California, = 0 otherwise
TXi = 1 if state is Texas, = 0 otherwise
MAi = 1 if state is Massachusetts, = 0 otherwise

• We estimate three regression lines, one for each state


• βCA is the intercept for CA; βTX is the intercept for TX; and βMA is the
intercept for MA
• βBeerTax is the slope of the linear regression lines
Example: Binary regressor form
. regress vfrall beertax CA TX MA if (state==06 | state==25 | state==48), noconstant

Source | SS df MS Number of obs = 21


-------------+---------------------------------- F(4, 17) = 636.78
Model | 71.7582541 4 17.9395635 Prob > F = 0.0000
Residual | .478930132 17 .028172361 R-squared = 0.9934
-------------+---------------------------------- Adj R-squared = 0.9918
Total | 72.2371843 21 3.43986592 Root MSE = .16785

------------------------------------------------------------------------------
vfrall | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
beertax | -2.563516 2.535592 -1.01 0.326 -7.913146 2.786115
CA | 2.151935 .252372 8.53 0.000 1.619477 2.684394
TX | 3.387019 1.100874 3.08 0.007 1.064377 5.709661
MA | 1.857948 .654464 2.84 0.011 .4771501 3.238747
------------------------------------------------------------------------------
Example: regression line for each state
Predicted fatality rates across states
4
2
Fatality rate
-2 -4
-6 0

0 1 2 3
Beertax

CA TX
MA
Fixed effects regression
Look at the regression lines for all three states:
FRCA,t = αCA + βBeerTaxBeerTaxCA,t + uCA,t
FRTX,t = αTX + βBeerTaxBeerTaxTX,t + uTX,t
FRMA,t = αMA + βBeerTaxBeerTaxMA,t + uMA,t

More compact way of writing the regression lines:


FRi,t = αi + βBeerTaxBeerTaxi,t + ui,t,

for i = {CA, TX, MA} and t = {1,…,T}


Example: Fixed effects regression
. xtreg vfrall beertax if (state==06 | state==25 | state==48), fe

Fixed-effects (within) regression Number of obs = 21


Group variable: state Number of groups = 3

R-sq: Obs per group:


within = 0.0567 min = 7
between = 0.1330 avg = 7.0
overall = 0.1118 max = 7

F(1,17) = 1.02
corr(u_i, Xb) = -0.7743 Prob > F = 0.3262

------------------------------------------------------------------------------
vfrall | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
beertax | -2.563516 2.535592 -1.01 0.326 -7.913146 2.786115
_cons | 2.465634 .6659065 3.70 0.002 1.060694 3.870574
-------------+----------------------------------------------------------------
sigma_u | .81136868
sigma_e | .16784624
rho | .95896182 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(2, 17) = 65.50 Prob > F = 0.0000
Regression with time fixed effects
• Time fixed effects
• An omitted variable may vary over time but not across states
• For example, safer cars; changes in national laws, etc.
• These factors produce intercepts that change over time
• Time fixed effects are written in binary regressor form
• Let y83, y84,…, y88 denote binary indicators for each year

• The model with time fixed effects is:


Fri,t = β0 + βBeerTaxBeerTaxi,t + βy83 y83t + βy84 y84t + βy85 y85t +
βy86 y86t + βy87 y87t + βy88 y88t + ui,t
Example: Time fixed effects model
. regress vfrall beertax y83 y84 y85 y86 y87 y88

Source | SS df MS Number of obs = 336


-------------+---------------------------------- F(7, 328) = 5.13
Model | 10.7442938 7 1.53489912 Prob > F = 0.0000
Residual | 98.1712161 328 .299302488 R-squared = 0.0986
-------------+---------------------------------- Adj R-squared = 0.0794
Total | 108.91551 335 .325120925 Root MSE = .54709

------------------------------------------------------------------------------
vfrall | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
beertax | .3663358 .0626 5.85 0.000 .2431877 .4894839
y83 | -.0820359 .1116734 -0.73 0.463 -.3017224 .1376506
y84 | -.0717331 .1116734 -0.64 0.521 -.2914195 .1479533
y85 | -.1105458 .1116765 -0.99 0.323 -.3302383 .1091467
y86 | -.0161185 .1116815 -0.14 0.885 -.2358209 .203584
y87 | -.0155355 .111695 -0.14 0.889 -.2352645 .2041935
y88 | -.0010271 .111718 -0.01 0.993 -.2208014 .2187471
_cons | 1.894848 .0856585 22.12 0.000 1.726338 2.063357
------------------------------------------------------------------------------
Example: State and time fixed effects model
Fixed-effects (within) regression Number of obs = 336
Group variable: state Number of groups = 48

R-sq: Obs per group:


within = 0.0803 min = 7
between = 0.1101 avg = 7.0
overall = 0.0876 max = 7

F(7,281) = 3.50
corr(u_i, Xb) = -0.6781 Prob > F = 0.0013

------------------------------------------------------------------------------
vfrall | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
beertax | -.6399799 .1973768 -3.24 0.001 -1.028505 -.2514551
y83 | -.0799029 .0383537 -2.08 0.038 -.1554 -.0044058
y84 | -.0724206 .0383517 -1.89 0.060 -.1479136 .0030725
y85 | -.1239763 .0384418 -3.23 0.001 -.1996468 -.0483058
y86 | -.0378645 .0385879 -0.98 0.327 -.1138225 .0380936
y87 | -.0509021 .0389737 -1.31 0.193 -.1276196 .0258155
y88 | -.0518038 .0396235 -1.31 0.192 -.1298003 .0261927
_cons | 2.42847 .1081198 22.46 0.000 2.215643 2.641298
-------------+----------------------------------------------------------------
sigma_u | .70945965
sigma_e | .18788295
rho | .93446372 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(47, 281) = 53.19 Prob > F = 0.0000
Exercises
• Estimate a fixed effects model (-xtreg-) of fatality rates
on beertax using data from 1982 and 1983
• What is the marginal effect of beertax on fatality rates?
• Is the marginal effect of beertax on fatality rates
significantly different from 0?
• Add a time indicator for 1983 to the model, do fatality rates
tend to increase or decrease over time?
• Is the time indicator significantly different from 0?
Summary
• Learning outcomes
• Understand what we mean by panel data
• Understand the distinction between entity and time fixed
effects
• Understand the underlying assumptions of fixed effects
models
• Estimate fixed effects models in Stata
Extra exercises
• Use the Risk_Panel_Balanced.dta dataset (Andersen et al. [2008])
• Generate dummy variables for each of the four risk aversion tasks and call
them task_1, task_2, task_3, and task_4
• Run an OLS regression model of crra on dummy variables for all four risk
aversion tasks
• What are the predicted crra-values for the four risk aversion tasks?
• Are the estimated coefficients for the four risk aversion tasks similar?
• Repeat the same two exercises with observations for the first and second
experiment, respectively (condition the regression on repeat=0, and repeat=1)
• Generate a new variable crra_diff that measures the difference in crra values
between the two experiments (i.e., crra when repeat=1 minus crra when
repeat=0)
• Run an OLS regression model of crra_diff on dummy variables for all four risk
aversion tasks
• What are the predicted crra-values for the four risk aversion tasks?
• Are the estimated coefficients for the four risk aversion tasks similar?
Extra exercises
• Continue using Risk_Panel_Balanced.dta
• Define the panel data structure (xtset id)
• Run a fixed effects (FE) model of crra on repeat for each risk
aversion task
• Are the estimated coefficients for repeat significantly different from 0?
• Compare the estimated coefficients for repeat to the results from an
OLS model with similar dependent and independent variables? Are the
estimated coefficients different?
• Add dummy variables for task 2, task 3, and task 4 to the FE and OLS
models. Are the estimated coefficients for repeat and the task identifiers
different across the two types of models?
• Identify and keep subjects who provided 8 non-missing responses.
• Run FE and OLS models of crra on repeat and dummy variables for
task 2, task 3, and task 4. Are the estimated coefficients for repeat and
the task identifiers different across the two types of models?

You might also like