the next year, and for b the fraction of reverse changes.

We can obtain a better approximation by taking into account the

previous two elections. In this case our states are RR, RD, DR, and

DD, indicating the outcome of two successive elections. Being in state

RR means that the last two elections were Republican victories. If the

next election is a Democratic victory, we will be in state RD. If the

election outcomes for a series of years is DDDRDRR, then our process

has moved from state DD to DD to DR to RD to DR, and ¬nally to RR.

Notice that the ¬rst letter of the state to which we move must agree

with the second letter of the state from which we came, since these

refer to the same election year. Our matrix of transition probabilities

will then have the form,

RR DR RD DD

«

1’a

RR 0 a 0

¬ ·

1’b

DR b 0 0 .

¬ ·

¬ ·

1’c

RD 0 0 c

1’d

DD 0 d 0

Again the numbers a, b, c, and d would have to be estimated. The

™¦

study of this example is continued in ??.

Example 4.30 The following example of a Markov chain has been

used in physics as a simple model for di¬usion of gases. We shall see

later that a similar model applies to an idealized problem in changing

populations.

We imagine n black balls and n white balls which are put into

two urns so that there are n balls in each urn. A single experiment

consists in choosing a ball from each urn at random and putting the ball

obtained from the ¬rst urn into the second urn, and the ball obtained

from the second urn into the ¬rst. We take as state the number of

black balls in the ¬rst urn. If at any time we know this number, then

we know the exact composition of each urn. That is, if there are j black

balls in urn 1, there must be n ’ j black balls in urn 2, n ’ j white

balls in urn 1, and j white balls in urn 2. If the process is in state j,

then after the next exchange it will be in state j ’ 1, if a black ball is

chosen from urn 1 and a white ball from urn 2. It will be in state j if a

ball of the same color is drawn from each urn. It will be in state j + 1

169

4.13. MARKOV CHAINS

if a white ball is drawn from urn 1 and a black ball from urn 2. The

transition probabilities are then given by (see Exercise 12)

j

pjj’1 = ( )2 , j > 0

n

2j(n ’ j)

pjj =

n2

n’j 2

pjj+1 = ( ) , j<n

n

pjk = 0 otherwise.

A physicist would be interested, for example, in predicting the compo-

sition of the urns after a certain number of exchanges have taken place.

Certainly any predictions about the early stages of the process would

depend upon the initial composition of the urns. For example, if we

started with all black balls in urn 1, we would expect that for some time

there would be more black balls in urn 1 than in urn 2. On the other

hand, it might be expected that the e¬ect of this initial distribution

would wear o¬ after a large number of exchanges. We shall see later,

™¦

in ??, that this is indeed the case.

Exercises

1. Draw a state diagram for the Markov chain with transition prob-

abilities given by the following matrices.

«

1 1

0

2 2

¬

0 1 0 ·,

1 1

02

2

«

1 1 1

3 3 3

¬ ·

1 1 1

,

3 3 3

1 1 1

3 3 3

01

,

10

«

0 100

¬ 0 0 0·

1

¬ ·

1 ·.

¬ 1

0 0 2 2

011

0 2 2

170 CHAPTER 4. PROBABILITY THEORY

Figure 4.24: ™¦

2. Give the matrix of transition probabilities corresponding to the

transition diagrams in Figure 4.24.

3. Find the matrix P (2) for the Markov chain determined by the

matrix of transition probabilities

1 1

2 2

P= .

1 2

3 3

5 7

12 12

[Ans. .]

7 11

18 18

4. What is the matrix of transition probabilities for the Markov

chain in Example 4.30, for the case of two white balls and two

black balls?

5. Find the matrices P (2) , P (3) , P (4) for the Markov chain deter-

mined by the transition probabilities

10

.

01

Find the same for the Markov chain determined by the matrix

01

.

10

171

4.13. MARKOV CHAINS

6. Suppose that a Markov chain has two states, a1 and a2 , and

transition probabilities given by the matrix

1 2

3 3 .

1 1

2 2

By means of a separate chance device we choose a state in which

1

to start the process. This device chooses a1 with probability 2

1

and a2 with probability 2 . Find the probability that the process

is in state a1 after the ¬rst step. Answer the same question in the

1

case that the device chooses a1 with probability 3 and a2 with

2

probability 3 .

54

[Ans. ; .]

12 9

7. Referring to the Markov chain with transition probabilities indi-

cated in Figure 4.22, construct the tree measures and determine

the values of

(3) (3) (3)

p21 , p22 , p23

and

(3) (3) (3)

p31 , p32 , p33 .

8. A certain calculating machine uses only the digits 0 and 1. It is

supposed to transmit one of these digits through several stages.

However, at every stage there is a probability p that the digit

which enters this stage will be changed when it leaves. We form a

Markov chain to represent the process of transmission by taking

as states the digits 0 and 1. What is the matrix of transition

probabilities?

9. For the Markov chain in Exercise 8, draw a tree and assign a tree

measure, assuming that the process begins in state 0 and moves

through three stages of transmission. What is the probability

that the machine after three stages produces the digit 0, i.e., the

correct digit? What is the probability that the machine never

changed the digit from 0?

10. Assume that a man™s profession can be classi¬ed as professional,

skilled laborer, or unskilled laborer. Assume that of the sons

of professional men 80 per cent are professional, 10 per cent are

skilled laborers, and 10 per cent are unskilled laborers. In the

172 CHAPTER 4. PROBABILITY THEORY

Figure 4.25: ™¦

case of sons of skilled laborers, 60 per cent are skilled laborers, 20

per cent are professional, and 20 per cent are unskilled laborers.

Finally, in the case of unskilled laborers, 50 per cent of the sons

are unskilled laborers, and 25 per cent each are in the other two

categories. Assume that every man has a son, and form a Markov

chain by following a given family through several generations. Set

up the matrix of transition probabilities. Find the probability

that the grandson of an unskilled laborer is a professional man.

[Ans. .375.]

11. In Exercise 10 we assumed that every man has a son. Assume

instead that the probability a man has a son is .8. Form a Markov

chain with four states. The ¬rst three states are as in Exercise

10, and the fourth state is such that the process enters it if a man

has no son, and that the state cannot be left. This state repre-

sents families whose male line has died out. Find the matrix of

transition probabilities and ¬nd the probability that an unskilled

laborer has a grandson who is a professional man.

[Ans. .24.]

12. Explain why the transition probabilities given in Example 4.30

are correct.

Supplementary exercises.

13. Five points are marked on a circle. A process moves clockwise

2

from a given point to its neighbor with probability 3 , or counter-

clockwise to its neighbor with probability 1 .

3

173

4.13. MARKOV CHAINS

(a) Considering the process to be a Markov chain process, ¬nd

the matrix of transition probabilities.

(b) Given that the process starts in a state 3, what is the prob-

ability that it returns to the same state in two steps?

14. In northern New England, years for apples can be described as

good, average, or poor. Suppose that following a good year the

probabilities of good, average, or poor years are respectively .4, .4,

and .2. Following a poor year the probabilities of good, average,

or poor years are .2, .4, and .4 respectively. Following an average

year the probabilities that the next year will be good or poor are

each .2, and of an average year, .6.

(a) Set up the transition matrix of this Markov chain.

(b) 1965 was a good year. Compute the probabilities for 1966,

1967, and 1968.

[Ans. For 1967: .28, .48, .24.]

1

15. In Exercise 14 suppose that there is probability 4 for a good

year, 2 for an average year, and 1 for a poor year. What are the

1

4

probabilities for the following year?

16. A teacher in an oversized mathematics class ¬nds, after grading

all homework papers for the ¬rst two assignments, that it is nec-

essary to reduce the amount of time spent in such grading. He

therefore designs the following system: Papers will be marked

satisfactory or unsatisfactory. All papers of students receiving a

mark of unsatisfactory on any assignment will be read on each of

the two succeeding days. Of the remaining papers, the teacher

will read one-¬fth, chosen at random. Assuming that each paper

has a probability of one-¬fth of being classi¬ed “unsatisfactory”,

(a) Set up a three-state Markov chain to describe the process.

(b) Suppose that a student has just handed in a satisfactory

paper. What are the probabilities for the next two assign-

ments?

17. In another model for di¬usion, it is assumed that there are two

urns which together contain N balls numbered from 1 to N . Each

second a number from 1 to N is chosen at random, and the ball

with the corresponding number is moved to the other urn. Set

174 CHAPTER 4. PROBABILITY THEORY

up a Markov chain by taking as state the number of balls in urn

1. Find the transition matrix.

4.14 The central limit theorem

We continue our discussion of the independent trials process with two

outcomes. As usual, let p be the probability of success on a trial, and

f (n, p; x) be the probability of exactly x successes in n trials.

In Figure 4.26 we have plotted bar graphs which represent f (n, .3; x)

for n = 10, 50, 100, and 200. We note ¬rst of all that the graphs are

drifting o¬ to the right. This is not surprising, since their peaks occur

at np, which is steadily increasing. We also note that while the total

area is always 1, this area becomes more and more spread out.

We want to redraw these graphs in a manner that prevents the

drifting and the spreading out. First of all, we replace x by x ’ np,

assuring that our peak always occurs at 0. Next we introduce a new

unit for measuring the deviation, which depends on n, and which gives

comparable scales. As we saw in Section 4.10, the standard deviation

√

npq is such a unit.

We must still insure that probabilities are represented by areas in

the graph. In Figure 4.26 this is achieved by having a unit base for each

rectangle, and having the probability f (n, p; x) as height. Since we are

now representing a standard deviation as a single unit on the horizontal

√

axis, we must take f (n, p; x) npq as the heights of our rectangles. The

resulting curves for n = 50 and 200 are shown in Figures 4.27 and 4.28,

respectively.

We note that the two ¬gures look very much alike. We have also

shown in Figure 4.28 that it can be approximated by a bell-shaped

curve. This curve represents the function

1 2

f (x) = √ e’x /2

2π

and is known as the normal curve. It is a fundamental theorem of

probability theory that as n increases, the appropriately rescaled bar

graphs more and more closely approach the normal curve. The theorem

is known as the Central Limit Theorem, and we have illustrated it

graphically.

More precisely, the theorem states that for any two numbers a and

175

4.14. THE CENTRAL LIMIT THEOREM

Figure 4.26: ™¦

176 CHAPTER 4. PROBABILITY THEORY

Figure 4.27: ™¦

Figure 4.28: ™¦

177

4.14. THE CENTRAL LIMIT THEOREM

Figure 4.29: ™¦

b, with a < b,

x ’ np

Pr[a < √ < b]

npq

approaches the area under the normal curve between a and b, as n

increases. This theorem is particularly interesting in that the normal

curve is symmetric about 0, while f (n, p; x) is symmetric about the

1

expected value np only for the case p = 2 . It should also be noted that

we always arrive at the same normal curve, no matter what the value

of p is.

In Figure 4.29 we give a table for the area under the normal curve

between 0 and d. Since the total area is 1, and since it is symmetric

about the origin, we can compute arbitrary areas from this table. For

example, suppose that we wish the area between ’1 and +2. The area

between 0 and 2 is given in the table as .477. The area between ’1

and 0 is the same as between 0 and 1, and hence is given as .341. Thus

the total area is .818. The area outside the interval (’1, 2) is then

178 CHAPTER 4. PROBABILITY THEORY

1 ’ .818 = .182.

Example 4.31 Let us ¬nd the probability that x di¬ers from the ex-

pected value np by as much as d standard deviations.

x ’ np

√

Pr[|x ’ np| ≥ d npq] = Pr[| √ ≥ d].

npq

and hence the approximate answer should be the area outside the

interval (’d, d) under the normal curve. For d = 1, 2, 3 we obtain

1 ’ (2 · .341) = .318, 1 ’ (2 · .477) = .046, 1 ’ (2 · .4987) = .0026, re-

spectively. These agree with the values given in Section 4.10, to within

rounding errors. In fact, the Central Limit Theorem is the basis of

™¦

those estimates.

Example 4.32 In Example 4.19 we considered the example of throw-

ing a coin 10,000 times. The expected √ number of heads that turn up

is 5000 and the standard deviation is 10, 000 · 2 · 1 = 50. We ob-

1

2

served that the probability of a deviation of more than two standard

deviations (or 100) was very unlikely. On the other hand, consider the

probability of a deviation of less than .1 standard deviation. That is,