0% found this document useful (0 votes)
333 views

CFA L2 SimpleSheets Formula Sheet Final

The document provides key formulas and concepts related to quantitative methods for the CFA Level 2 exam, focusing on multiple regression, model evaluation, and time-series analysis. It includes equations for adjusted R-squared, Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and various statistical tests such as the Breusch-Pagan test for heteroskedasticity and the Durbin-Watson test for serial correlation. Additionally, it covers issues like multicollinearity, influence analysis, and autoregressive models, along with practical guidelines for their application.

Uploaded by

samsurap
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
333 views

CFA L2 SimpleSheets Formula Sheet Final

The document provides key formulas and concepts related to quantitative methods for the CFA Level 2 exam, focusing on multiple regression, model evaluation, and time-series analysis. It includes equations for adjusted R-squared, Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and various statistical tests such as the Breusch-Pagan test for heteroskedasticity and the Durbin-Watson test for serial correlation. Additionally, it covers issues like multicollinearity, influence analysis, and autoregressive models, along with practical guidelines for their application.

Uploaded by

samsurap
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

CFA® Exam Formulas | Level 2

2025 Edition
CFA
Level 2

Quantitative Methods
_
​​ 2​​ ​= 1 − (
n − k − 1)
n−1
Multiple Regression ​Adjusted ​R​​ 2​ = R ​ ​_
​   ​ ​(1 − R
​ ​​ 2​ )​

​AIC = n ln​(​ _______________ ​)​ + 2(k + 1)​


• Multiple Regression Equation Sum of squares error
   n
​Multiple regression equation = ​Y​ i​​ = ​b0​  ​​ + ​b1​  ​​ ​X1i​  ​​ + ​b2​  ​​ ​X2i​  ​​+ . . . + ​bk​  ​​ ​Xki​  ​​ + ​ε​  i​​,
​BIC = n ln​(_______________ ​)​ + ln (n ) (k + 1)​
Sum of squares error
i = 1, 2, . . . , n​ ​    n
where: • Testing Joint Hypotheses for Coefficients
Yi = the ith observation of the dependent variable Y
Xji = t he ith observation of the independent variable Xj, (Sum of squares error restricted model −
    
​ 
Sum of squares error unrestricted model)/q ​​
j = 1,2,…, k ​F = ________________________________________
    
​      
b0 = the intercept of the equation Sum of squares error unrestricted model / (n − k − 1)
b1, . . ., bk = t he slope coefficients for each of the independent
 ​​Testing for Heteroskedasticity—The Breusch-Pagan (BP) Test
variables
εi = the error term for the ith observation χ2 = nR2 with k degrees of freedom.
n = the number of observations
n = Number of observations
Residual Term R2 = C
 oefficient of determination of the second regression
(the regression when the squared residuals of the original
​​​ε ​​ ̂ i​​ = ​Yi​  ​​ − ​​Y ​​ ̂ i​​ = ​Yi​  ​​ − ( ​​b̂ ​​  0​​ + ​​b̂ ​​  1​​ ​X1i​  ​​ + ​​b̂ ​​  2​​ ​X2i​  ​​+ . . . + ​​b̂ ​​  k​​ ​Xki​  ​​ )​
regression are regressed on the independent variables)
k = Number of independent variables
• F-statistic
Testing for Serial Correlation–The Durbin-Watson (DW) Test
SSR / k
MSR _____________
​F-stat = _
​   ​ = ​     ​​ DW ≈ 2(1 − r); where r is the sample correlation between
MSE SSE / [n − (k + 1)]
squared residuals from one period and those from the
• Evaluating Regression Model Fit previous period.
Total variation − Unexplained variation _
SST − SSE _ SSR
​​R​​ 2​ = ____________________________
​           ​ = ​   ​ = ​   ​​
Total variation SST SST

Value of Durbin-Watson Statistic


(H0: No serial correlation)

Reject H0, Reject H0,


conclude conclude
Positive Serial Do not Reject Negative Serial
Correlation Inconclusive H0 Inconclusive Correlation

0 dl du 4 − du 4 − dl 4

Detecting Multicollinearity

VIFJ = 1/(1− RJ2).

• VIFJ > 5 warrants further investigation of the given independent variable.


• VIFJ > 10 indicates serious multicollinearity requiring correction.

Problems in Linear Regression and Solutions

Problem Effect Solution


Heteroskedasticity Incorrect standard errors Use robust standard errors (corrected for
conditional heteroskedasticity)
Serial correlation Incorrect standard errors (additional Use robust standard errors (corrected for
problems if a lagged value of the dependent serial correlation)
variable is used as an independent variable)
Multicollinearity High R2 and low t-statistics Remove one or more independent variables;
often no solution based in theory
• Influence Analysis • Autoregressive (AR) Time-Series Models
Studentized Residual A first-order autoregressive model is represented as:
​e​  i​ *​ __________ ​ei​  ​​ ​​xt​  ​​ = ​b0​  ​​ + ​b1​  ​​ ​xt−1
​  ​​ + ​ε​  t​​​
​​t​i​  ​​  *​​​ = ​  _   
​s​e​  ​​ * ​​​​ = ​  ​√
_________  ​​
MS ​Ei​  ​​ (1 − ​hii​  ​​) ​
A pth order autoregressive model is represented as:
In the equivalent formula (on the right) the terms are based ​​xt​  ​​ = ​b0​  ​​ + ​b1​  ​​ ​xt−1
​  ​​ + ​b2​  ​​ ​xt−2
​  ​​+ … + b
​ p​  ​​ ​xt−p
​  ​​ + ​ε​  t​​​
on the initial regression with n observations where:
​​e​  i​ *​​ is the residual with the i th observation deleted • Detecting Serially Correlated Errors in an AR Model
se* is the standard deviation of the residuals
k is the number of independent variables Residual autocorrelation for lag
​t-stat = _____________________________
    
​      ​​
MSEi is the mean squared error of the regression with the i th Standard error of residual autocorrelation
observation eliminated
hii is the leverage value for the ith observation where: _
Standard error of residual autocorrelation = 1/​​√T ​​
• Cook’s Distance T = Number of observations in the time series

k × MSE [ ​(1 − h ​ ii​  ​​ )​​2​]


​e​  2i​  ​ ​hii​  ​​ • Mean Reversion
​​Di​  ​​ = _
​  ​​ _
​   ​ ​​
​b0​  ​​
​​xt​  ​​ = _
​   ​​
Where: 1−b ​ 1​  ​​
​​e​  i​ *​​ is the residual for observation i
• Multiperiod Forecasts and the Chain Rule of Forecasting

​​​x ̂​​  t+1​​ = ​​b̂ ​​  0​​ + ​​b̂ ​​  1​​ ​xt​  ​​​


k is the number of independent variables
MSE is the mean squared error of the estimated regression model
hii is the leverage value for observation i
• Random Walks
Practical guidelines for using Cook’s D are the following:
​​xt​  ​​ = ​xt−1
​  ​​ + ​ε​  t​​, E( ​ε​  t​​) = 0, E( ε​ ​  2t​  ​) = σ
​ 2​​ ​, E( ​ε​  t​​ ​ε​  s​​) = 0 if t ≠ s​
• If Di is greater than 0.5 the ith observation may be influential and
merits further investigation. The first difference of the random walk equation is given as:
• If Di is greater than 1.0 observation is highly likely to be an
influential data point. ​​yt​  ​​ = ​xt​  ​​ − ​xt−1
​  ​​ = ​xt−1
​  ​​ + ​ε​  t​​ − ​xt−1
​  ​​ = ​ε​  t​​, E( ​ε​  t​​) = 0, E( ε​ ​  2t​  ​) = σ
​ 2​​ ​, E( ​ε​  t​​ ​ε​  s​​ )
_
• If ​​Di​  ​​ > √
​ k/n ​​ the ith observation is highly likely to be an influential = 0 for t ≠ s​
data point.
• Random Walk with a Drift
• Logit Model
​​xt​  ​​ = ​b0​  ​​ + ​b1​  ​​ ​xt−1
​  ​​ + ​ε​  t​​

(1 − p)
p ​b1​  ​​= 1, b
​ 0​  ​​≠ 0, or
​ln​ _
​  ​ ​= b
​ 0​  ​​ + ​b1​  ​​ ​X1​  ​​ + ​b2​  ​​ ​X2​  ​​ + ​b3​  ​​ ​X3​  ​​ + ε​
​xt​  ​​ = ​b0​  ​​ + ​xt−1
​  ​​ + ​ε​  t​​, E( ​ε​  t​​ ) = 0​
• Event Probability
The first-difference of the random walk with a drift equation
1 is given as:
​p = ______________________________
  
​     ​​
1 + exp [−( b
​ 0​  ​​ + ​b1​  ​​ ​X1​  ​​ + ​b2​  ​​ ​X2​  ​​ + ​b3​  ​​ ​X3​  ​​)]
​​yt​  ​​ = ​xt​  ​​ − ​xt−1
​  ​​, ​yt​  ​​ = ​b0​  ​​ + ​ε​  t​​, ​b0​  ​​ ≠ 0​

Time-Series Analysis • The Unit Root Test of Nonstationarity


• Linear Trend Models ​​xt​  ​​ = ​b0​  ​​ + ​b1​  ​​ ​xt−1
​  ​​ + ​ε​  t​​
​​yt​  ​​ = ​b0​  ​​ + ​b1​  ​​ t + ε​ ​  t​​, t = 1, 2, . . . , T​ ​xt​  ​​ − ​xt−1
​  ​​ = ​b0​  ​​ + ​b1​  ​​ ​xt−1
​  ​​ − ​xt−1
​  ​​ + ​ε​  t​​
​xt​  ​​ − ​xt−1
​  ​​ = ​b0​  ​​ + ( ​b1​  ​​ − 1) ​xt−1
​  ​​ + ​ε​  t​​
where:
yt = the value of the time series at time t (value of the ​xt​  ​​ − ​xt−1
​  ​​ = ​b0​  ​​ + ​g1​  ​​ ​xt−1
​  ​​ + ​ε​  t​​​
dependent variable)
b0 = the y-intercept term • The null hypothesis for the Dickey-Fuller test is that g1 = 0
b1 = the slope coefficient/trend coefficient (effectively means that b1 = 1) and that the time series has a
t = time, the independent or explanatory variable unit root, which makes it nonstationary.
εt = a random-error term • The alternative hypothesis for the Dickey-Fuller test is that
g1 < 0, (effectively means that b1 < 1) and that the time series
• Log-Linear Trend Models is covariance stationary (i.e., it does not have a unit root).
A series that grows exponentially can be described using the
• Seasonality
following equation:
​​xt​  ​​ = ​b0​  ​​ + ​b1​  ​​ ​xt−1
​  ​​ + ​b2​  ​​ ​xt−n
​  ​​ + ​ε​  t​​​
​​yt​  ​​ = ​e​​  ​b​  ​​+​b​  ​​t​​
0 1

Where n = number of periods in the seasonal pattern


where:
yt = the value of the time series at time t (value of the dependent • Moving Average Models
variable)
b0 = the y-intercept term ​​xt​  ​​ = ​ε​  t​​ + θ ​ε​  t−1​​, E( ​ε​  t​​ ) = 0, E( ​ε​  2t​  ​) = σ
​ 2​​ ​, E( ​ε​  t​​ ​ε​  s​​) = 0 for t ≠ s​
b1 = the slope coefficient
​​xt​  ​​ = ​ε​  t​​ + θ ​ε​  t−1​​ + … ​θ​  q​​ ​ε​  t−q​​, E( ​ε​  t​​ ) = 0, E( ​ε​  2t​  ​) = σ
​ 2​​ ​, E( ​ε​  t​​ ​ε​  s​​) = 0 for t ≠ s​
t = time = 1, 2, 3, …, T

We take the natural logarithm of both sides of the equation to • Autoregressive Moving Average (ARMA) Models
arrive at the equation for the log-linear model:
​​xt​  ​​ = ​b0​  ​​ + ​b1​  ​​ ​xt−1
​  ​​+ . . . + ​ b​ p​​ ​xt−p
​  ​​ + ​ε​  t​​ + ​θ​  1​​ ​ε​  t−1​​+ . . . + ​ θ​  q​​ ​ε​  t−q​​
​ln ​yt​  ​​ = ​b0​  ​​ + ​b1​  ​​ t + ε​ ​  t​​, t = 1, 2, . . . , T​ E( ​ε​  t​​) = 0, E( ε​ ​  2t​  ​) = σ
​ 2​​ ​, E( ​ε​  t​​ ​ε​  s​​) = 0 for t ≠ s​
• Autoregressive Conditional Heteroskedasticity Models (ARCH • Standardization
Models)
​Xi​  ​​ − μ
​​Xi​  (​standardized)​​​ = _
​  σ ​​
​​​ε ​​ ̂ 2t​  ​= ​a0​  ​​ + ​​â ​​  1​​ ​​ε ​​ ̂ 2t−1 ​​  + ​ut​  ​​​

The error in period t+1 can then be predicted using the following • Performance Evaluation
formula:
​Accuracy = (TP + TN)/(TP + FP + TN + FN)​
​​​σ̂ ​​  2t+1 ​​  = ​​â ​​  0​​ + ​​â ​​  1​​ ​​ε ​​ ̂ 2t​  ​​
​FI score = (2 * P * R)/(P + R)​
Big Data Projects ​Precision (P) = TP/(TP + FP)​
• Normalization
​Recall (R) = TP/(TP + FN)​
​ ​  ​​ − ​X​  ​​
X
​RMSE = ​ ​∑
______________________
​​Xi​  (​normalized)​​​ = _
i min
​   ​​ n ​( ​Predicted​  ​​− ​Actual​  ​​ )​​2​
​Xmax
​  ​​ − ​Xmin ​  ​​
    √
​  ​​ _________________
  
i=1
n
i i
​​ ​

Machine Learning

ML Algorithm Type Supervised/Unsupervised When to Use?


Classification and Supervised Most commonly applied to binary classification or regression.
Regression Tree (CART)
Deep Learning Net Both A form of neural network with three or more ”hidden” layers
Ensemble Learning Supervised The use of a combination of algorithms to describe the data.
Hierarchical Unsupervised A form of clustering data (separating observations into groups) into different and
final levels of clusters based on relationships between clusters.
K-Means Unsupervised A form of clustering data into a predetermined number of groups.
K-Nearest Neighbor (KNN) Supervised Mainly used for classification, by classifying new observations based on
existing data.
LASSO Supervised A type of penalized regression that also eliminates the least important features of
the regression model.
Neural Networks Both Commonly used for regression and classification in which input features (similar to
regression independent variables) are connected to the output (target) variable by
“hidden” layers of relationships.
Penalized Regression Supervised Regression technique to avoid overfitting by penalizing data features that make
insufficient contribution to the regression model.
Principal Components Unsupervised Used to help reduce the features in a data set to a manageable level.
Analysis (PCA)
Random Forest Supervised Type of ensemble learning using collection of decision trees.
Reinforcement Learning Unsupervised An algorithm that uses the experience of millions of trials and errors to maximize
future success.
Support Vector Machine Supervised Used for classification, regression, and outlier detection by finding the optimal
(SVM) boundary between sets of data points.

• LASSO Penalized Regression Constraint

​​∑​  ​​ ​​(​Yi​  ​​ − ​​Y ̂​​  i​​)​​​ ​ + λ ​∑​  ​​​ ​​b̂ ​​  k​​ ​​ | |


n 2 K

i=1 k=1

where:
​λ​= hyperparameter set by researcher prior to learning
​​bk​  ​​​= regression coefficient of kth feature (factor)
If you enjoyed exploring Quantitative Methods
in our free version of SimpleSheets, you'll love diving
into the other topics with our “Plus” version!

Quantitative Methods

Economics

Financial Statment Analysis

Corporate Issuers

Equity

Fixed Income
With SimpleSheets+, you get access to all key
formulas throughout the whole curriculum. Derivatives

Alternative Investments

Portfolio Management

Ethical and
Professional Standards

SimpleSheets+ is included in our full courses or can be purchased as a supplement


to enhance efficiency and aid in understanding CFA formulas and concepts.

Choose Course Buy Supplement

© 2025 Copyright UWorld, LLC. All Rights Reserved

CFA-L2-SS-1224

You might also like