<< . .

. 6
( : 18)

. . >>

If an I(d) process can be reduced to AR(p) process while applying the
differencing operator, it is named ARI(p, d) process and has the form:
D1 d y(t) À a1 D1 d y(t À 1) À . . . À ap D1 d y(t À p) ¼ e(t), t ! p þ d
Note that differencing a time series d times reduces the number of
independent variables by d, so that the total number of independent
variables in ARI(p, d) within the sample with n observations equals
n À p À d.
The unit root is another notion widely used for discerning perman-
ent and transitory effects of random shocks. It is based on the roots of
the characteristic polynomial for the AR(p) model. For example,
AR(1) has the characteristic polynomial
1 À a1 z ¼ 0 (5:1:10)
If a1 ¼ 1, then z ¼ 1 and the characteristic polynomial has the
unit root. In general, the characteristic polynomial roots can have
complex values. The solution to equation (5.1.10) is outside the unit
circle (i.e., z > 1) when a1 < 1. It can be shown that all solutions for
AR(p) are outside the unit circle when
1 À a1 z À a2 z2 À . . . À ap zp ¼ 0 (5:1:11)

A model more general than AR(p) contains both lagged observa-
tions and lagged noise
y(t) ¼ a1 y(t À 1) þ a2 y(t À 2) þ . . . þ ap y(t À p) þ e(t)
þ b1 e(t À 1) þ b2 e(t À 2) þ . . . þ bq e(t À q) (5:1:12)
This model is called autoregressive moving average model of order
(p,q), or simply ARMA(p,q). Sometimes modeling of empirical data
46 Time Series Analysis

requires AR(p) with a rather high number p. Then, ARMA(p, q) may
be more efficient in that the total number of its terms (p þ q) needed
for given accuracy is lower than the number p in AR(p). ARMA(p, q)
can be expanded into the integrated model, ARIMA(p, d, q), similar
to the expansion of AR(p) into ARI(p, d). Neglecting the autoregres-
sive terms in ARMA(p, q) yields a ˜˜pure™™ moving average model
y(t) ¼ e(t) þ b1 e(t À 1) þ b2 e(t À 2) þ . . . þ bq e(t À q) (5:1:13)
MA(q) can be presented in the form
y(t) ¼ Bq (L)e(t) (5:1:14)
where Bq (L) is the MA polynomial in lag operator
Bq (L) ¼ 1 þ b1 L þ b2 L2 þ . . . þ bq Lq (5:1:15)
The moving average model does not depend explicitly on the lagged
values of y(t). Yet, it is easy to show that this model implicitly
incorporates the past. Consider, for example, the MA(1) model
y(t) ¼ e(t) þ b1 e(t À 1) (5:1:16)
with e(0) ¼ 0. For this model,
y(1) ¼ e(1), y(2) ¼ e(2) þ b1 e(1) ¼ e(2) þ b1 y(1),
y(3) ¼ e(3) þ b1 e(2) ¼ e(3) þ b1 (y(2) À b1 y(1)), . . .
Thus, the general result for MA(1) has the form

y(t)(1 À b1 L þ b1 L2 À b1 L3 þ . . . ) ¼ e(t) (5:1:17)
Equation (5.1.17) can be viewed as the AR(1) process, which illus-
trates that the MA model does depend on past.
The MA(q) model is invertible if it can be transformed into an
AR(1) model. It can be shown that MA(q) is invertible if all solu-
tions to the equation

1 þ b1 z þ b2 z2 þ . . . þ bq zq ¼ 0 (5:1:18)
are outside the unit circle. In particular, MA(1) is invertible if
jb1 j < 1. If the process y(t) has a non-zero mean value m, then the
AR(1) model can be presented in the following form
Time Series Analysis

y(t) À m ¼ a1 [y(t À 1) À m] þ e(t) ¼ c þ a1 y(t À 1) þ e(t) (5:1:19)
In (5.1.19), intercept c equals:
c ¼ m(1 À a1 ) (5:1:20)
The general AR(p) model with a non-zero mean has the following
Ap (L)y(t) ¼ c þ e(t), c ¼ m(1 À a1 À . . . ap ) (5:1:21)
Similarly, the intercept can be included into the general moving
average model MA(q)
y(t) ¼ c þ Bp (L)e(t), c ¼ m (5:1:22)
Note that mean of the MA model coincides with its intercept because
mean of the white noise is zero.


Now, let us introduce the autocorrelation function (ACF) for pro-
cess y(t)
r(k) ¼ g(k)=g(0) (5:1:23)
where g(k) is the autocovariance of order k
g(k) ¼ E[y(t) À m)(y(t À k) À m)] (5:1:24)
The autocorrelation functions may have some typical patterns, which
can be used for identification of empirical time series [2]. The obvious
properties of ACF are
r(0) ¼ 1, À 1 < r(k) < 1 for k 6¼ 0 (5:1:25)
ACF is closely related to the ARMA parameters. In particular, for
r(1) ¼ a1 (5:1:26)
The ACF of the first order for MA(1) equals
r(1) ¼ b1 =(b1 2 þ 1) (5:1:27)
The right-hand side of the expression (5.1.27) has the same value for
the inverse transform b1 ! 1=b1 . For example, two processes
48 Time Series Analysis

x(t) ¼ e(t) þ 2e(t À 1)
y(t) ¼ e(t) þ 0:5e(t À 1)
have the same r(1). Note, however, that y(t) is an invertible process
while x(t) is not.
ARMA modeling is widely used for forecasting. Consider a fore-
cast of a variable y(t þ 1) based on a set of n variables x(t) known at
moment t. This set can be just past values of y, that is,
y(t), y(t À 1), . . . , y(t À n þ 1). Let us denote the forecast with
^(t þ 1jt). The quality of forecast is usually defined with the some
loss function. The mean squared error (MSE) is the conventional loss
function in many applications
MSE(^(t þ 1jt)) ¼ E[(y(t þ 1)À^(t þ 1jt))2 ]
y y (5:1:28)
The forecast that yields the minimum of MSE turns out to be the
expectation of y(t þ 1) conditioned on x(t)
^(t þ 1jt) ¼ E[y(t þ 1)jx(t)]
y (5:1:29)
In the case of linear regression
y(t þ 1) ¼ b0 x(t) þ e(t) (5:1:30)
MSE is reduced to the ordinary least squares (OLS) estimate for b.
For a sample with T observations,
x(t)x0 (t)
b¼ x(t)y(t þ 1)= (5:1:31)
t¼1 t¼1

Another important concept in the time series analysis is the maximum
likelihood estimate (MLE) [2]. Consider the general ARMA model
(5.1.12) with the white noise (4.2.6). The problem is how to estimate
the ARMA parameters on the basis of given observations of y(t). The
idea of MLE is to find such a vector r0 ¼ (a1 , . . . , ap , . . . ,
b1 , . . . , bq , s2 ) that maximizes the likelihood function for given ob-
servations (y1 , y2 , . . . , yT )
f1 , 2 , . . . , T (y1 , y2 , . . . , yT ; r0 ) (5:1:32)
The likelihood function (5.1.32) has the sense of probability of ob-
serving the data sample (y1 , y2 , . . . , yT ). In this approach, the ARMA
model and the probability distribution for the white noise should be
Time Series Analysis

specified first. Often the normal distribution leads to reasonable
estimates even if the real distribution is different. Furthermore, the
likelihood function must be calculated for the chosen ARMA model.
Finally, the components of the vector r0 must be estimated. The latter
step may require sophisticated numerical optimization technique.
Details of implementation of MLE are discussed in [2].

Finding trends is an important part of the time series analysis.
Presence of trend implies that the time series has no mean reversion.
Moreover, mean and variance of a trending process depend on the
sample. The time series with trend is named non-stationary. If a
process y(t) is stationary, its mean, variance, and autocovariance are
finite and do not depend on time. This implies that autocovariance
(5.1.24) depends only on the lag parameter k. Such a definition of
stationarity is also called covariance-stationarity or weak stationarity
because it does not impose any restrictions on the higher moments of
the process. Strict stationarity implies that higher moments also do
not depend on time. Note that any MA process is covariance-station-
ary. However, the AR(p) process is covariance-stationary only if the
roots of its polynomial are outside the unit circle.
It is important to discern deterministic trend and stochastic trend.
They have a different nature yet their graphs may look sometimes
very similar [1]. Consider first the AR(1) model with the deterministic
y(t) À m À ct ¼ a1 (y(t À 1) À m À c(t À 1)) þ e(t) (5:2:1)
Let us introduce z(t) ¼ y(t) À m À ct. Then equation (5.2.1) has the
a1 tÀi e(t)
z(t) ¼ a1 z(0) þ (5:2:2)

where z(0) is a pre-sample starting value of z. Obviously, the random
shocks are transitory if ja1 j < 1. The trend incorporated in the defin-
ition of z(t) is deterministic when ja1 j < 1. However, if a1 ¼ 1, then
equation (5.2.1) has the form
50 Time Series Analysis

y(t) ¼ c þ y(t À 1) þ e(t) (5:2:3)
The process (5.2.3) is named the random walk with drift. In this case,
equation (5.2.2) is reduced to
z(t) ¼ z(0) þ (5:2:4)

The sum of non-transitory shocks in the right-hand side of equation
(5.2.4) is named stochastic trend. Consider, for example, the determin-
istic trend model with m ¼ 0 and e(t) ¼ N(0, 1)
y(t) ¼ 0:1t þ e(t) (5:2:5)
and the stochastic trend model
y(t) ¼ 0:1 þ y(t À 1) þ e(t), y(0) ¼ 0 (5:2:6)
As it can be seen from Figure 5.1, both graphs look similar. In
general, however, the stochastic trend model can deviate from the
deterministic trend for a long time.
Stochastic trend implies that the process is I(1). Then the lag
polynomial (5.1.3) can be represented in the form




0 10 20 30 40
Figure 5.1 Deterministic and stochastic trends: A - equation (5.2.5), B -
equation (5.2.6).
Time Series Analysis

Ap (L) ¼ (1 À L)ApÀ1 (L) (5:2:7)
Similarly, the process I(2) has the lag polynomial

Ap (L) ¼ (1 À L)2 ApÀ2 (L) (5:2:8)

and so on. The standard procedure for testing presence of the unit
root in time series is the Dickey-Fuller method [1, 2]. This method is
implemented in major econometric software packages (see the Section
Seasonal effects may play an important role in the properties of time
series. Sometimes, there is a need to eliminate these effects in order to
focus on the stochastic specifics of the process. Various differencing
filters can be used for achieving this goal [2]. In other cases, seasonal
effect itself may be the object of interest. The general approach for
handling seasonal effects is introducing dummy parameters D(s, t)
where s ¼ 1, 2, . . . , S; S is the number of seasons. For example,
S ¼ 12 is used for modeling the monthly effects. Then the parameter
D(s, t) equals 1 at a specific season s and equals zero at all other
seasons. The seasonal extension of an ARMA(p,q) model has the
following form
y(t) ¼ a1 y(t À 1) þ a2 y(t À 2) þ . . . þ ap y(t À p) þ e(t)
þb1 e(t À 1) þ b2 e(t À 2) þ . . . þ bq e(t À q) þ ds D(s, t) (5:2:9)

Note that forecasting with the model (5.2.9) requires estimating
(p þ q þ S) parameters.

So far, we considered random processes with the white noise (4.2.6)
that are characterized with constant unconditional variance. Condi-
tional variance has not been discussed so far. In general, the processes
with unspecified conditional variance are named homoskedastic.
Many random time series are not well described with the IID process.
In particular, there may be strong positive autocorrelation in squared
asset returns. This means that large returns (either positive or nega-
tive) follow large returns. In this case, it is said that the return
52 Time Series Analysis

volatility is clustered. The effect of volatility clustering is also called
autoregressive conditional heteroskedasticity (ARCH). It should be
noted that small autocorrelation in squared returns does not neces-
sarily mean that there is no volatility clustering. Strong outliers that
lead to high values of skewness and kurtosis may lower autocorrela-
tion. If these outliers are removed from the sample, volatility cluster-
ing may become apparent [3].
Several models in which past shocks contribute to the current
volatility have been developed. Generally, they are rooted in the
ARCH(m) model where the conditional variance is a weighed sum
of m squared lagged returns
s2 (t) ¼ v þ a1 e2 (t À 1) þ a2 e2 (t À 2) þ . . . þ am e2 (t À m) (5:3:1)
In (5.3.1), e(t) $ N(0, s2 (t)), v > 0, a1 , . . . , am ! 0. Unfortunately,
application of the ARCH(m) process to modeling the financial time
series often requires polynomials with high order m. A more efficient
model is the generalized ARCH (GARCH) process. The GARCH
(m, n) process combines the ARCH(m) process with the AR(n) pro-
cess for lagged variance
s2 (t) ¼ v þ a1 e2 (t À 1) þ a2 e2 (t À 2) þ . . . þ am e2 (t À m)
þ b1 s2 (t À 1) þ b2 s2 (t À 2) þ . . . þ bn s2 (t À n) (5:3:2)
The simple GARCH(1, 1) model is widely used in applications
s2 (t) ¼ v þ ae2 (t À 1) þ bs2 (t À 1) (5:3:3)
Equation (5.3.3) can be transformed into
s2 (t) ¼ v þ (a þ b)s2 (t À 1) þ a[e2 (t) À s2 (t À 1)] (5:3:4)
The last term in equation (5.3.4) is conditioned on information avail-
able at time (t À 1) and has zero mean. This term can be treated as a
shock to volatility. Therefore, the unconditional expectation of vola-
tility for the GARCH(1, 1) model equals
E[s2 (t)] ¼ v=(1 À a À b) (5:3:5)
This implies that the GARCH(1, 1) process is weakly stationary when
a þ b < 1. The advantage of the stationary GARCH(1, 1) model is
that it can be easily used for forecasting. Namely, the conditional
expectation of volatility at time (t þ k) equals [4]
Time Series Analysis

E[s2 (t þ k)] ¼ (a þ b)k [s2 (t) À v=(1 À a À b)] þ v=(1 À a À b) (5:3:6)
The GARCH(1, 1) model (5.3.4) can be rewritten as
s2 (t) ¼ v=(1 À b) þ a(e2 (t À 1) þ be2 (t À 2) þ b2 e2 (t À 3) þ . . . ) (5:3:7)
Equation (5.3.7) shows that the GARCH(1, 1) model is equivalent to
the infinite ARCH model with exponentially weighed coefficients.
This explains why the GARCH models are more efficient than the
ARCH models.
Several GARCH models have been described in the econometric
literature [1“3]. One popular GARCH(1, 1) model with a þ b ¼ 1 is
called integrated GARCH (IGARCH). It has the autoregressive unit
root. Therefore volatility of this process follows random walk and can
be easily forecasted
E[s2 (t þ k)] ¼ s2 (t) þ kv (5:3:8)
IGARCH can be presented in the form
s2 (t) ¼ v þ (1 À l)e2 (t À 1) þ ls2 (t À 1) (5:3:9)
where 0 < l < 1. If v ¼ 0, IGARCH coincides with the exponentially
weighed moving average (EWMA)
liÀ1 e2 (t À i)
s (t) ¼ (1 À l) (5:3:10)

Indeed, the n-period EWMA for a time series y(t) is defined as
z(t) ¼ [y(t À 1) þ ly(t À 2) þ l2 y(t À 3) þ . . .þ
nÀ1 n
y(t À n)]=(1 þ l þ . . . þ l )
where 0 < l < 1. For large n, the denominator of (5.3.11) converges
to 1=(1 À l). Then for z(t) ¼ s2 (t) and y(t) ¼ e2 (t), equation (5.3.11) is
equivalent to equation (5.3.7) with v ¼ 0.
The GARCH models discussed so far are symmetric in that the
shock sign does not affect the resulting volatility. In practice, how-
ever, negative price shocks influence volatility more than the positive
shocks. A noted example of the asymmetric GARCH model is the
exponential GARCH (EGARCH) (see, e.g., [3]). It has the form
log [s2 (t)] ¼ v þ b log [s2 (t À 1)] þ lz(t À 1)þ
p¬¬¬¬¬¬¬¬ (5:3:12)
g(jz(t À 1)j À 2=p)
54 Time Series Analysis

where z(t) ¼ e(t)=s(t). Note that E[z(t)] ¼ 2=p. Hence, the last term
in (5.3.12) is the mean deviation of z(t). If g > 0 and l < 0, negative
shocks lead to higher volatility than positive shocks.

Often the current value of a variable depends not only on its past
values, but also on past and/or current values of other variables.
Modeling of dynamic interdependent variables is conducted with
multivariate time series. The multivariate models yield not only new
implementation problems but also some methodological difficulties.
In particular, one should be cautious with simple regression models
y(t) ¼ ax(t) þ e(t) (5:4:1)
that may lead to spurious results. It is said that (5.4.1) is a simultan-
eous equation as both explanatory (x) and dependent (y) variables are
present at the same moment of time. A notorious example for spuri-
ous inference is the finding that the best predictor in the United
Nations database for the Standard & Poor™s 500 stock index is
production of butter in Bangladesh [5].
A statistically sound yet spurious relationship is named data
snooping. It may appear when the data being the subject of research
are used to construct the test statistics [4]. Another problem with
simultaneous equations is that noise can be correlated with the ex-
planatory variable, which leads to inaccurate OLS estimates of the
regression coefficients. Several techniques for handling this problem
are discussed in [2].
A multivariate time series y(t) ¼ (y1 (t), y2 (t), . . . , yn (t))0 is a vector
of n processes that have data available for the same moments of time.
It is supposed also that all these processes are either stationary or
have the same order of integration. In practice, the multivariate
moving average models are rarely used due to some restrictions [1].
Therefore, we shall focus on the vector autoregressive model (VAR)
that is a simple extension of the univariate AR model to multivariate
time series. Consider a bivariate VAR(1) process
y1 (t) ¼ a10 þ a11 y1 (t À 1) þ a12 y2 (t À 1) þ e1 (t)
y2 (t) ¼ a20 þ a21 y1 (t À 1) þ a22 y2 (t À 1) þ e2 (t) (5:4:2)
Time Series Analysis

that can be presented in the matrix form
y(t) ¼ a0 þ Ay(t À 1) þ «(t) (5:4:3)
In (5.4.3), y(t) ¼ (y1 (t), y2 (t))0 , a0 ¼ (a10 , a20 )0 , «(t) ¼ (e1 (t), e2 (t))0 ,

<< . .

. 6
( : 18)

. . >>