2.2.piroject Macroeconomrtic
2.2.piroject Macroeconomrtic
Dec, 2020
Let’s begin the data analysis by looking into the summary statistics; we will get a
quick idea of the data distribution.
> summary(dataa)
t Twpi
1960q1 : 1 Min. : 30.50
1960q2 : 1 1st Qu.: 32.58
1960q3 : 1 Median : 56.60
1960q4 : 1 Mean : 62.77
1961q1 : 1 3rd Qu.: 96.88
1961q2 : 1 Max. :116.20
(Other):118
We can see the number of observations matches our expectations; the mean is
about 62.77 which we can consider our level in this series. Other statistics like
standard deviation and percentiles suggest a large spread of the data
Time Series Plot of Twpi
120
100
80
Twpi
60
40
20
1 12 24 36 48 60 72 84 96 108 120
Index
Here, the line plot suggests that there is an increasing trend of Thailand wholesale
price index over time. This insight gives us a hint that data may not be stationary
and we can explore differencing with one level to make it stationary before
modeling.
4.75
4.50
4.25
ln_Twpi
4.00
3.75
3.50
1 12 24 36 48 60 72 84 96 108 120
Index
ADF Statistic= - 0.0397
P-values=0.9596
Critical values:
1%: -3.4851
5%: -2.8854
10%: -2.5795
The results show that the test statistic value - 0.0397 is smaller than the critical
value at 1% of -3.4851. This suggests that we can reject the null hypothesis
with a significance level of less than 1%.
The results show that the test statistic value - 0.0397 is smaller than the critical
value at 5% of -2.8854. This suggests that we can reject the null hypothesis
with a significance level of less than 5%.
The results show that the test statistic value - 0.0397 is smaller than the critical
value at 10% of -2.5795. This suggests that we can reject the null hypothesis
with a significance level of less than 10%.
All are rejecting the null hypothesis means that the time series is
stationary.
It’s ideal to use a differenced dataset as the input for our ARIMA model. As we
know this dataset is stationary, therefore parameter‘d’ can see set to 1.
These parameters are also known as p and q respectively. We can identify these
parameters using Autocorrelation Function (ACF) and Partial Autocorrelation
Function (PACF).
just like with ARMA models, the ACF and PACF cannot be used to identify
reliable values for p and q.
Let’s walk through an example of modelling with ARIMA to get some hands-
on experience and better understand some modelling concepts.
i) Stationary Checking and Differencing
To test our data is weather stationary or not by plotting the first then
the graph. the below plotting is indicated that there is not stationary
Time Series Plot of ln_Twpi
4.75
4.50
4.25
ln_Twpi
4.00
3.75
3.50
1 12 24 36 48 60 72 84 96 108 120
Index
Autocorrelation Function for ln_Twpi
(with 5% significance limits for the autocorrelations)
1.0
0.8
0.6
0.4
Autocorrelation
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Lag
1.0
0.8
0.6
Partial Autocorrelation
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Lag
The series is clearly no stationary if this is the pattern of the ACF
and PACF
The PACF after the lags 1 are not stationary significances
Data Transformation
We would like to detrend our data.
We can see there is still an obvious trend after log transformation. Cubic trend is
also fit for our data shown in below plot of dfwpi difference log transformation
change much for the distribution of our data.
0.06
0.05
0.04
difTwpi
0.03
0.02
0.01
0.00
-0.01
-0.02
1 12 24 36 48 60 72 84 96 108 120
Index
1.0
0.8
0.6
0.4
Autocorrelation
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Lag
1.0
0.8
0.6
Partial Autocorrelation
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Lag
From ACF plot below, autocorrelation exceeds the dashed lines at 1 up to 6 a
few lags. So model may be a good fit for our data.
The ACF for, lags,1,2,3,4,5 and 6 are statistical siginificantis
Also for PACF, the lags, 1,2 ,4 and 25 lags are statistical significances
To find and identified the order of ARIMA model is to consider the ACF
,PACF and the associated the correlogram
The tentative model are ;- ARIMA(1,1,1), ARIMA(2,1,1), ARIMA(1,1,2),
ARIMA(3,1,3), ARIMA(3,1,1), ARIMA(2,1,2), ARIMA(3,1,2)
ACF shows a significant lag of 6 lags, which means an ideal value for p is 6.
PACF shows a significant lag of 4 lags, which means an ideal value for q is 4.
Now, we have all the required parameters for the ARIMA model.
The best appropriate model should be selected from the above model based on;
the Most significant coefficients, lowest volatility etc is ARIMA(3,1,1)
model
ARIMA(3,1,1) model
ARIMA Model: DIFF_ln_Twi
1.0
0.8
0.6
0.4
Autocorrelation
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Lag
PACF of Residuals for DIFF_ln_Twi
(with 5% significance limits for the partial autocorrelations)
1.0
0.8
0.6
Partial Autocorrelation
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Lag
0.04
0.03
0.02
0.01
Residual
0.00
-0.01
-0.02
-0.03
-0.04
-0.05
1 10 20 30 40 50 60 70 80 90 100 110 120
Observation Order
v) Forecasting
ARIMA Model: DIFF_ln_Twi
Lag 12 24 36 48
Chi-Square 7.7 12.2 25.9 37.3
DF 8 20 32 44
P-Value 0.458 0.909 0.768 0.753
95% Limits
Period Forecast Lower Upper Actual
125 0.0212354 -0.0003424 0.0428132
126 0.0164002 -0.0078205 0.0406210
127 0.0221369 -0.0039365 0.0482103
128 0.0218962 -0.0058769 0.0496692
129 0.0205551 -0.0106009 0.0517111
130 0.0200829 -0.0126114 0.0527773
131 0.0214239 -0.0133042 0.0561520
132 0.0207691 -0.0155575 0.0570957
133 0.0208476 -0.0174702 0.0591653
134 0.0206722 -0.0190771 0.0604216
135 0.0210381 -0.0204100 0.0624862
136 0.0207309 -0.0221307 0.0635926
We can see, there is a very small bias in the model. Ideally, the mean should
have been zero. We will use this mean value (0.00000) to correct the bias in our
prediction by adding this value to each forecast.
Residuals from ARIMA (3,1,1) First, we plot the residual to check whether
they fit a mean stationary model with mean 0. This plot probably shows a sign
of mean stationary with mean around 0 The ARIMA does not provide p-values
and so you can calculate t-statistics. If non-constant variance is concern,
look at a plot of residuals versus fits and/or a time series plot of the
residuals
Residual Plots for DIFF_ln_Twi
Normal Probability Plot Versus Fits
99.9 0.04
99
0.02
90
Residual
Percent
0.00
50
-0.02
10
-0.04
1
0.1
-0.04 -0.02 0.00 0.02 0.04 0.00 0.02 0.04 0.06
Residual Fitted Value
Residual
20 0.00
-0.02
10
-0.04
0
-0.045 -0.030 -0.015 0.000 0.015 0.030 1 10 20 30 40 50 60 70 80 90 100 110 120
1.0
0.8
0.6
Partial Autocorrelation
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Lag
Look at the ACF of residuals. For a good model, all autocorrelations for residual
series should be non-significant
Look at box-pierce(ljung) test for possible residual autocorrelation at
various lags
Modified Box-Pierce (Ljung-Box) Chi-Square statistic
Lag 12 24 36 48
Chi-Square 7.7 12.2 25.9 37.3
DF 8 20 32 44
P-Value 0.458 0.909 0.768 0.753
Look at the significances of the coefficients. Provides p-values and so you may
simply compare the p-values to standard 0.05 cut off.
Final Estimates of Parameters
We can see errors have slightly reduced and mean has also shifted towards zero.
The graphs also suggest a Gaussian distribution
Histogram
(response is DIFF_ln_Twi)
35
30
25
Frequency
20
15
10
0
-0.045 -0.030 -0.015 0.000 0.015 0.030
Residual
99
95
90
80
70
P erc en t
60
50
40
30
20
10
5
0.1
-0.05 -0.04 -0.03 -0.02 -0.01 0.00 0.01 0.02 0.03 0.04
Residual
In our example we had a very small bias, so this bias correction may not have
proved to be a significant improvement, but in real-life scenarios, this is an
important technique to be explored at the end in case any bias exists.
So, our model has passed all the criteria. We can save this model for later use.
1.7. Model Validation
To generally evaluation the prediction ability of the ARIMA (3, 1, 1)
model, we compare it with the original log-transformed data in the
same plot. For model diagnostics, we need to check the residual of the
model first. The ideal plot is similar as the plot generated by Gaussian
white noise, which means that the residuals are independent and
identically distributed from normal distributed. Moreover the residual
should be fitted in a mean stationary model with mean 0
We can observe the actual and forecasted values for the validation dataset.
These values are also plotted on a line plot which shows a promising result of
our model.
rend Analysis Plot for DIFF_ln_Twi
* NOTE * Zero values of Yt exist; MAPE calculated only for non-zero Yt.
Data DIFF_ln_Twi
Length 124
NMissing 1
Yt = 0.00719 + 0.000058×t
Accuracy Measures
MAPE 174.566
MAD 0.010
MSD 0.000
Forecasts
Period Forecast
125 0.0143958
126 0.0144534
127 0.0145111
128 0.0145687
129 0.0146264
130 0.0146840
131 0.0147417
132 0.0147993
133 0.0148570
134 0.0149146
135 0.0149723
136 0.0150299
0.07 Variable
Actual
0.06 Fits
Forecasts
0.05
Accuracy Measures
0.04 MAPE 174.566
DIFF_ln_Twi
MAD 0.010
0.03 MSD 0.000
0.02
0.01
0.00
-0.01
-0.02
1 14 28 42 56 70 84 98 112 126
Index
The trend plot that shows the original data, the fitted trend line, and forecasts. The
window output also displays the fitted trend equation and three measures of accuracy to
help determine the accuracy of the fitted values: MAPE, MAD, and MDS. the Thailand
wholesale price index show a general upward trend, though with an evident seasonal
component. the trend model appears to fit well to the overall trend, but the seasonal
pattern is not well fit. to better fit these data, you also use decomposition on the stored
residuals and add the trend analysis and decomposition fits and forecasts
the three measures are not very informative by themselves, but you can use them to
compare the fits obtained by using different methods. for all three measures, smaller
values generally indicate a better fitting model