ñòð. 6 |

differencing operator, it is named ARI(p, d) process and has the form:

D1 d y(t) Ã€ a1 D1 d y(t Ã€ 1) Ã€ . . . Ã€ ap D1 d y(t Ã€ p) Â¼ e(t), t ! p Ã¾ d

(5:1:9)

Note that differencing a time series d times reduces the number of

independent variables by d, so that the total number of independent

variables in ARI(p, d) within the sample with n observations equals

n Ã€ p Ã€ d.

The unit root is another notion widely used for discerning perman-

ent and transitory effects of random shocks. It is based on the roots of

the characteristic polynomial for the AR(p) model. For example,

AR(1) has the characteristic polynomial

1 Ã€ a1 z Â¼ 0 (5:1:10)

If a1 Â¼ 1, then z Â¼ 1 and the characteristic polynomial has the

unit root. In general, the characteristic polynomial roots can have

complex values. The solution to equation (5.1.10) is outside the unit

circle (i.e., z > 1) when a1 < 1. It can be shown that all solutions for

AR(p) are outside the unit circle when

1 Ã€ a1 z Ã€ a2 z2 Ã€ . . . Ã€ ap zp Â¼ 0 (5:1:11)

5.1.2 MOVING AVERAGE MODELS

A model more general than AR(p) contains both lagged observa-

tions and lagged noise

y(t) Â¼ a1 y(t Ã€ 1) Ã¾ a2 y(t Ã€ 2) Ã¾ . . . Ã¾ ap y(t Ã€ p) Ã¾ e(t)

Ã¾ b1 e(t Ã€ 1) Ã¾ b2 e(t Ã€ 2) Ã¾ . . . Ã¾ bq e(t Ã€ q) (5:1:12)

This model is called autoregressive moving average model of order

(p,q), or simply ARMA(p,q). Sometimes modeling of empirical data

46 Time Series Analysis

requires AR(p) with a rather high number p. Then, ARMA(p, q) may

be more efficient in that the total number of its terms (p Ã¾ q) needed

for given accuracy is lower than the number p in AR(p). ARMA(p, q)

can be expanded into the integrated model, ARIMA(p, d, q), similar

to the expansion of AR(p) into ARI(p, d). Neglecting the autoregres-

sive terms in ARMA(p, q) yields a â€˜â€˜pureâ€™â€™ moving average model

MA(q)

y(t) Â¼ e(t) Ã¾ b1 e(t Ã€ 1) Ã¾ b2 e(t Ã€ 2) Ã¾ . . . Ã¾ bq e(t Ã€ q) (5:1:13)

MA(q) can be presented in the form

y(t) Â¼ Bq (L)e(t) (5:1:14)

where Bq (L) is the MA polynomial in lag operator

Bq (L) Â¼ 1 Ã¾ b1 L Ã¾ b2 L2 Ã¾ . . . Ã¾ bq Lq (5:1:15)

The moving average model does not depend explicitly on the lagged

values of y(t). Yet, it is easy to show that this model implicitly

incorporates the past. Consider, for example, the MA(1) model

y(t) Â¼ e(t) Ã¾ b1 e(t Ã€ 1) (5:1:16)

with e(0) Â¼ 0. For this model,

y(1) Â¼ e(1), y(2) Â¼ e(2) Ã¾ b1 e(1) Â¼ e(2) Ã¾ b1 y(1),

y(3) Â¼ e(3) Ã¾ b1 e(2) Â¼ e(3) Ã¾ b1 (y(2) Ã€ b1 y(1)), . . .

Thus, the general result for MA(1) has the form

y(t)(1 Ã€ b1 L Ã¾ b1 L2 Ã€ b1 L3 Ã¾ . . . ) Â¼ e(t) (5:1:17)

Equation (5.1.17) can be viewed as the AR(1) process, which illus-

trates that the MA model does depend on past.

The MA(q) model is invertible if it can be transformed into an

AR(1) model. It can be shown that MA(q) is invertible if all solu-

tions to the equation

1 Ã¾ b1 z Ã¾ b2 z2 Ã¾ . . . Ã¾ bq zq Â¼ 0 (5:1:18)

are outside the unit circle. In particular, MA(1) is invertible if

jb1 j < 1. If the process y(t) has a non-zero mean value m, then the

AR(1) model can be presented in the following form

47

Time Series Analysis

y(t) Ã€ m Â¼ a1 [y(t Ã€ 1) Ã€ m] Ã¾ e(t) Â¼ c Ã¾ a1 y(t Ã€ 1) Ã¾ e(t) (5:1:19)

In (5.1.19), intercept c equals:

c Â¼ m(1 Ã€ a1 ) (5:1:20)

The general AR(p) model with a non-zero mean has the following

form

Ap (L)y(t) Â¼ c Ã¾ e(t), c Â¼ m(1 Ã€ a1 Ã€ . . . ap ) (5:1:21)

Similarly, the intercept can be included into the general moving

average model MA(q)

y(t) Â¼ c Ã¾ Bp (L)e(t), c Â¼ m (5:1:22)

Note that mean of the MA model coincides with its intercept because

mean of the white noise is zero.

5.1.3 AUTOCORRELATION FORECASTING

AND

Now, let us introduce the autocorrelation function (ACF) for pro-

cess y(t)

r(k) Â¼ g(k)=g(0) (5:1:23)

where g(k) is the autocovariance of order k

g(k) Â¼ E[y(t) Ã€ m)(y(t Ã€ k) Ã€ m)] (5:1:24)

The autocorrelation functions may have some typical patterns, which

can be used for identification of empirical time series [2]. The obvious

properties of ACF are

r(0) Â¼ 1, Ã€ 1 < r(k) < 1 for k 6Â¼ 0 (5:1:25)

ACF is closely related to the ARMA parameters. In particular, for

AR(1)

r(1) Â¼ a1 (5:1:26)

The ACF of the first order for MA(1) equals

r(1) Â¼ b1 =(b1 2 Ã¾ 1) (5:1:27)

The right-hand side of the expression (5.1.27) has the same value for

the inverse transform b1 ! 1=b1 . For example, two processes

48 Time Series Analysis

x(t) Â¼ e(t) Ã¾ 2e(t Ã€ 1)

y(t) Â¼ e(t) Ã¾ 0:5e(t Ã€ 1)

have the same r(1). Note, however, that y(t) is an invertible process

while x(t) is not.

ARMA modeling is widely used for forecasting. Consider a fore-

cast of a variable y(t Ã¾ 1) based on a set of n variables x(t) known at

moment t. This set can be just past values of y, that is,

y(t), y(t Ã€ 1), . . . , y(t Ã€ n Ã¾ 1). Let us denote the forecast with

^(t Ã¾ 1jt). The quality of forecast is usually defined with the some

y

loss function. The mean squared error (MSE) is the conventional loss

function in many applications

MSE(^(t Ã¾ 1jt)) Â¼ E[(y(t Ã¾ 1)Ã€^(t Ã¾ 1jt))2 ]

y y (5:1:28)

The forecast that yields the minimum of MSE turns out to be the

expectation of y(t Ã¾ 1) conditioned on x(t)

^(t Ã¾ 1jt) Â¼ E[y(t Ã¾ 1)jx(t)]

y (5:1:29)

In the case of linear regression

y(t Ã¾ 1) Â¼ b0 x(t) Ã¾ e(t) (5:1:30)

MSE is reduced to the ordinary least squares (OLS) estimate for b.

For a sample with T observations,

X X

T T

x(t)x0 (t)

bÂ¼ x(t)y(t Ã¾ 1)= (5:1:31)

tÂ¼1 tÂ¼1

Another important concept in the time series analysis is the maximum

likelihood estimate (MLE) [2]. Consider the general ARMA model

(5.1.12) with the white noise (4.2.6). The problem is how to estimate

the ARMA parameters on the basis of given observations of y(t). The

idea of MLE is to find such a vector r0 Â¼ (a1 , . . . , ap , . . . ,

b1 , . . . , bq , s2 ) that maximizes the likelihood function for given ob-

servations (y1 , y2 , . . . , yT )

f1 , 2 , . . . , T (y1 , y2 , . . . , yT ; r0 ) (5:1:32)

The likelihood function (5.1.32) has the sense of probability of ob-

serving the data sample (y1 , y2 , . . . , yT ). In this approach, the ARMA

model and the probability distribution for the white noise should be

49

Time Series Analysis

specified first. Often the normal distribution leads to reasonable

estimates even if the real distribution is different. Furthermore, the

likelihood function must be calculated for the chosen ARMA model.

Finally, the components of the vector r0 must be estimated. The latter

step may require sophisticated numerical optimization technique.

Details of implementation of MLE are discussed in [2].

5.2 TRENDS AND SEASONALITY

Finding trends is an important part of the time series analysis.

Presence of trend implies that the time series has no mean reversion.

Moreover, mean and variance of a trending process depend on the

sample. The time series with trend is named non-stationary. If a

process y(t) is stationary, its mean, variance, and autocovariance are

finite and do not depend on time. This implies that autocovariance

(5.1.24) depends only on the lag parameter k. Such a definition of

stationarity is also called covariance-stationarity or weak stationarity

because it does not impose any restrictions on the higher moments of

the process. Strict stationarity implies that higher moments also do

not depend on time. Note that any MA process is covariance-station-

ary. However, the AR(p) process is covariance-stationary only if the

roots of its polynomial are outside the unit circle.

It is important to discern deterministic trend and stochastic trend.

They have a different nature yet their graphs may look sometimes

very similar [1]. Consider first the AR(1) model with the deterministic

trend

y(t) Ã€ m Ã€ ct Â¼ a1 (y(t Ã€ 1) Ã€ m Ã€ c(t Ã€ 1)) Ã¾ e(t) (5:2:1)

Let us introduce z(t) Â¼ y(t) Ã€ m Ã€ ct. Then equation (5.2.1) has the

solution

X

t

t

a1 tÃ€i e(t)

z(t) Â¼ a1 z(0) Ã¾ (5:2:2)

iÂ¼1

where z(0) is a pre-sample starting value of z. Obviously, the random

shocks are transitory if ja1 j < 1. The trend incorporated in the defin-

ition of z(t) is deterministic when ja1 j < 1. However, if a1 Â¼ 1, then

equation (5.2.1) has the form

50 Time Series Analysis

y(t) Â¼ c Ã¾ y(t Ã€ 1) Ã¾ e(t) (5:2:3)

The process (5.2.3) is named the random walk with drift. In this case,

equation (5.2.2) is reduced to

X

t

z(t) Â¼ z(0) Ã¾ (5:2:4)

e(t)

iÂ¼1

The sum of non-transitory shocks in the right-hand side of equation

(5.2.4) is named stochastic trend. Consider, for example, the determin-

istic trend model with m Â¼ 0 and e(t) Â¼ N(0, 1)

y(t) Â¼ 0:1t Ã¾ e(t) (5:2:5)

and the stochastic trend model

y(t) Â¼ 0:1 Ã¾ y(t Ã€ 1) Ã¾ e(t), y(0) Â¼ 0 (5:2:6)

As it can be seen from Figure 5.1, both graphs look similar. In

general, however, the stochastic trend model can deviate from the

deterministic trend for a long time.

Stochastic trend implies that the process is I(1). Then the lag

polynomial (5.1.3) can be represented in the form

7

y(t)

6

5

4

B

3

A

2

1

t

0

0 10 20 30 40

Figure 5.1 Deterministic and stochastic trends: A - equation (5.2.5), B -

equation (5.2.6).

51

Time Series Analysis

Ap (L) Â¼ (1 Ã€ L)ApÃ€1 (L) (5:2:7)

Similarly, the process I(2) has the lag polynomial

Ap (L) Â¼ (1 Ã€ L)2 ApÃ€2 (L) (5:2:8)

and so on. The standard procedure for testing presence of the unit

root in time series is the Dickey-Fuller method [1, 2]. This method is

implemented in major econometric software packages (see the Section

5.5).

Seasonal effects may play an important role in the properties of time

series. Sometimes, there is a need to eliminate these effects in order to

focus on the stochastic specifics of the process. Various differencing

filters can be used for achieving this goal [2]. In other cases, seasonal

effect itself may be the object of interest. The general approach for

handling seasonal effects is introducing dummy parameters D(s, t)

where s Â¼ 1, 2, . . . , S; S is the number of seasons. For example,

S Â¼ 12 is used for modeling the monthly effects. Then the parameter

D(s, t) equals 1 at a specific season s and equals zero at all other

seasons. The seasonal extension of an ARMA(p,q) model has the

following form

y(t) Â¼ a1 y(t Ã€ 1) Ã¾ a2 y(t Ã€ 2) Ã¾ . . . Ã¾ ap y(t Ã€ p) Ã¾ e(t)

X

S

Ã¾b1 e(t Ã€ 1) Ã¾ b2 e(t Ã€ 2) Ã¾ . . . Ã¾ bq e(t Ã€ q) Ã¾ ds D(s, t) (5:2:9)

sÂ¼1

Note that forecasting with the model (5.2.9) requires estimating

(p Ã¾ q Ã¾ S) parameters.

5.3 CONDITIONAL HETEROSKEDASTICITY

So far, we considered random processes with the white noise (4.2.6)

that are characterized with constant unconditional variance. Condi-

tional variance has not been discussed so far. In general, the processes

with unspecified conditional variance are named homoskedastic.

Many random time series are not well described with the IID process.

In particular, there may be strong positive autocorrelation in squared

asset returns. This means that large returns (either positive or nega-

tive) follow large returns. In this case, it is said that the return

52 Time Series Analysis

volatility is clustered. The effect of volatility clustering is also called

autoregressive conditional heteroskedasticity (ARCH). It should be

noted that small autocorrelation in squared returns does not neces-

sarily mean that there is no volatility clustering. Strong outliers that

lead to high values of skewness and kurtosis may lower autocorrela-

tion. If these outliers are removed from the sample, volatility cluster-

ing may become apparent [3].

Several models in which past shocks contribute to the current

volatility have been developed. Generally, they are rooted in the

ARCH(m) model where the conditional variance is a weighed sum

of m squared lagged returns

s2 (t) Â¼ v Ã¾ a1 e2 (t Ã€ 1) Ã¾ a2 e2 (t Ã€ 2) Ã¾ . . . Ã¾ am e2 (t Ã€ m) (5:3:1)

In (5.3.1), e(t) $ N(0, s2 (t)), v > 0, a1 , . . . , am ! 0. Unfortunately,

application of the ARCH(m) process to modeling the financial time

series often requires polynomials with high order m. A more efficient

model is the generalized ARCH (GARCH) process. The GARCH

(m, n) process combines the ARCH(m) process with the AR(n) pro-

cess for lagged variance

s2 (t) Â¼ v Ã¾ a1 e2 (t Ã€ 1) Ã¾ a2 e2 (t Ã€ 2) Ã¾ . . . Ã¾ am e2 (t Ã€ m)

Ã¾ b1 s2 (t Ã€ 1) Ã¾ b2 s2 (t Ã€ 2) Ã¾ . . . Ã¾ bn s2 (t Ã€ n) (5:3:2)

The simple GARCH(1, 1) model is widely used in applications

s2 (t) Â¼ v Ã¾ ae2 (t Ã€ 1) Ã¾ bs2 (t Ã€ 1) (5:3:3)

Equation (5.3.3) can be transformed into

s2 (t) Â¼ v Ã¾ (a Ã¾ b)s2 (t Ã€ 1) Ã¾ a[e2 (t) Ã€ s2 (t Ã€ 1)] (5:3:4)

The last term in equation (5.3.4) is conditioned on information avail-

able at time (t Ã€ 1) and has zero mean. This term can be treated as a

shock to volatility. Therefore, the unconditional expectation of vola-

tility for the GARCH(1, 1) model equals

E[s2 (t)] Â¼ v=(1 Ã€ a Ã€ b) (5:3:5)

This implies that the GARCH(1, 1) process is weakly stationary when

a Ã¾ b < 1. The advantage of the stationary GARCH(1, 1) model is

that it can be easily used for forecasting. Namely, the conditional

expectation of volatility at time (t Ã¾ k) equals [4]

53

Time Series Analysis

E[s2 (t Ã¾ k)] Â¼ (a Ã¾ b)k [s2 (t) Ã€ v=(1 Ã€ a Ã€ b)] Ã¾ v=(1 Ã€ a Ã€ b) (5:3:6)

The GARCH(1, 1) model (5.3.4) can be rewritten as

s2 (t) Â¼ v=(1 Ã€ b) Ã¾ a(e2 (t Ã€ 1) Ã¾ be2 (t Ã€ 2) Ã¾ b2 e2 (t Ã€ 3) Ã¾ . . . ) (5:3:7)

Equation (5.3.7) shows that the GARCH(1, 1) model is equivalent to

the infinite ARCH model with exponentially weighed coefficients.

This explains why the GARCH models are more efficient than the

ARCH models.

Several GARCH models have been described in the econometric

literature [1â€“3]. One popular GARCH(1, 1) model with a Ã¾ b Â¼ 1 is

called integrated GARCH (IGARCH). It has the autoregressive unit

root. Therefore volatility of this process follows random walk and can

be easily forecasted

E[s2 (t Ã¾ k)] Â¼ s2 (t) Ã¾ kv (5:3:8)

IGARCH can be presented in the form

s2 (t) Â¼ v Ã¾ (1 Ã€ l)e2 (t Ã€ 1) Ã¾ ls2 (t Ã€ 1) (5:3:9)

where 0 < l < 1. If v Â¼ 0, IGARCH coincides with the exponentially

weighed moving average (EWMA)

X

n

liÃ€1 e2 (t Ã€ i)

2

s (t) Â¼ (1 Ã€ l) (5:3:10)

iÂ¼1

Indeed, the n-period EWMA for a time series y(t) is defined as

z(t) Â¼ [y(t Ã€ 1) Ã¾ ly(t Ã€ 2) Ã¾ l2 y(t Ã€ 3) Ã¾ . . .Ã¾

(5:3:11)

nÃ€1 n

y(t Ã€ n)]=(1 Ã¾ l Ã¾ . . . Ã¾ l )

l

where 0 < l < 1. For large n, the denominator of (5.3.11) converges

to 1=(1 Ã€ l). Then for z(t) Â¼ s2 (t) and y(t) Â¼ e2 (t), equation (5.3.11) is

equivalent to equation (5.3.7) with v Â¼ 0.

The GARCH models discussed so far are symmetric in that the

shock sign does not affect the resulting volatility. In practice, how-

ever, negative price shocks influence volatility more than the positive

shocks. A noted example of the asymmetric GARCH model is the

exponential GARCH (EGARCH) (see, e.g., [3]). It has the form

log [s2 (t)] Â¼ v Ã¾ b log [s2 (t Ã€ 1)] Ã¾ lz(t Ã€ 1)Ã¾

pï¬ƒï¬ƒï¬ƒï¬ƒï¬ƒï¬ƒï¬ƒï¬ƒ (5:3:12)

g(jz(t Ã€ 1)j Ã€ 2=p)

54 Time Series Analysis

pï¬ƒï¬ƒï¬ƒï¬ƒï¬ƒï¬ƒï¬ƒï¬ƒ

where z(t) Â¼ e(t)=s(t). Note that E[z(t)] Â¼ 2=p. Hence, the last term

in (5.3.12) is the mean deviation of z(t). If g > 0 and l < 0, negative

shocks lead to higher volatility than positive shocks.

5.4 MULTIVARIATE TIME SERIES

Often the current value of a variable depends not only on its past

values, but also on past and/or current values of other variables.

Modeling of dynamic interdependent variables is conducted with

multivariate time series. The multivariate models yield not only new

implementation problems but also some methodological difficulties.

In particular, one should be cautious with simple regression models

y(t) Â¼ ax(t) Ã¾ e(t) (5:4:1)

that may lead to spurious results. It is said that (5.4.1) is a simultan-

eous equation as both explanatory (x) and dependent (y) variables are

present at the same moment of time. A notorious example for spuri-

ous inference is the finding that the best predictor in the United

Nations database for the Standard & Poorâ€™s 500 stock index is

production of butter in Bangladesh [5].

A statistically sound yet spurious relationship is named data

snooping. It may appear when the data being the subject of research

are used to construct the test statistics [4]. Another problem with

simultaneous equations is that noise can be correlated with the ex-

planatory variable, which leads to inaccurate OLS estimates of the

regression coefficients. Several techniques for handling this problem

are discussed in [2].

A multivariate time series y(t) Â¼ (y1 (t), y2 (t), . . . , yn (t))0 is a vector

of n processes that have data available for the same moments of time.

It is supposed also that all these processes are either stationary or

have the same order of integration. In practice, the multivariate

moving average models are rarely used due to some restrictions [1].

Therefore, we shall focus on the vector autoregressive model (VAR)

that is a simple extension of the univariate AR model to multivariate

time series. Consider a bivariate VAR(1) process

y1 (t) Â¼ a10 Ã¾ a11 y1 (t Ã€ 1) Ã¾ a12 y2 (t Ã€ 1) Ã¾ e1 (t)

y2 (t) Â¼ a20 Ã¾ a21 y1 (t Ã€ 1) Ã¾ a22 y2 (t Ã€ 1) Ã¾ e2 (t) (5:4:2)

55

Time Series Analysis

that can be presented in the matrix form

y(t) Â¼ a0 Ã¾ Ay(t Ã€ 1) Ã¾ Â«(t) (5:4:3)

In (5.4.3), y(t) Â¼ (y1 (t), y2 (t))0 , a0 Â¼ (a10 , a20 )0 , Â«(t) Â¼ (e1 (t), e2 (t))0 ,

ñòð. 6 |