Hi there, let’s discuss about time series analysis using ARIMA models in this blog.
First of all what is time series? A time series is simply a sequence of numbers collected at regular intervals over a period of time. Some examples of time series are prices of quantities over a period of time, GDP of a region etc. So typically when we use these models we try to decompose into trend of seasonal or cyclical components.
Basically time series data are used to create forecasting models; Time series forecasting is the use of a model to predict future values based on previously observed values. Models for time series data can have many forms and represent different processes.
When modeling variations in the level of a process, three broad classes of practical importance are
- autoregressive(AR) models
- Integrated(I) models
- Moving average model(MA)
These three classes depend linearly on previous data points
Combinations of these ideas produce Auto regressive Moving average (ARMA) and auto regressive integrated moving average (ARIMA) models. They are applied in some cases where data shows evidence of non-stationarity.
STATIONARITY: An assumption in time series techniques that the data is stationary i.e. stationary process has the property that the mean, variance and auto-correlation structure do not change over time.
MOVING AVERAGE: It is a calculation to analyze data points by creating series of averages of different subsets of the full data set.
How to implement an ARIMA model using R is what I am going to tell you.
STEP1: Load the data.
You can get the dataset from this link.
The data is about “producer pricing index”.
The head of the data set is as shown
Y <- ppi –> dependent variable
d.Y <- diff(Y) –> function diff () returns suitably lagged and iterated differences.
t <- yearqrt –> time
STEP2: plotting the variables.
From the plot it is clear that the data is not stationary so let’s conduct a test for testing the stationarity of the data.
STEP3: Conducting various statistical tests
Test1: Dickey fuller test
This test is used to determine the stationarity.
adf.test(Y, alternative=”stationary”, k=0)
The test statistic is -0.79 and P-value is considerably high and hence null hypothesis cannot be rejected which means that the data is not stationary
Conclusion of this test is that the data is stationary.
Running this test on the differenced variable we get
Test statistic is -6.8398 which suggests that the differenced variable is having stationarity. So the conclusion of this test is that we need to use difference variable in ARIMA model
Test2: correlation test
acf(Y) –> correlation function
The graph suggests that the data is not stationary.
pacf(Y) –> partial correlation function
STEP 4: Implementing ARIMA model
Have a look at the basic syntax by following the link
Estimate different ARIMA models.
arima(Y, order = c(1,0,0))
arima(Y, order = c(2,0,0))
arima(Y, order = c(0,0,1))
arima(Y, order = c(1,0,1))
# ARIMA on differenced variable
arima(d.Y, order = c(1,0,0))
arima(d.Y, order = c(0,0,1))
arima(d.Y, order = c(1,0,1))
arima(d.Y, order = c(1,0,3))
arima(d.Y, order = c(2,0,3))
Best ARIMA model is selected based on the value of AIC, lower the value of the AIC better is the model
STEP 5: finally, predicting using the ARIMA model of (1, 0, 1) and then plotting the output.
mydata.arima101 <- arima(Y, order = c(1,0,1))
mydata.pred1 <- predict(mydata.arima101, n.ahead=100)
Now, predicting using the differenced variable we get
mydata.arima111 <- arima(d.Y, order = c(1,0,1))
mydata.pred1 <- predict(mydata.arima111, n.ahead=100)