Figure 4.14: ™¦

140 CHAPTER 4. PROBABILITY THEORY

designed the experiment and the method we used for the required de-

cision. In some cases we are not too worried about the errors and can

make a relatively simple experiment. In other cases, errors are very im-

portant, and the experiment must be designed with that fact in mind.

For example, the possibility of error is certainly important in the case

that a vaccine for a given disease is proposed, and the statistician is

asked to help in deciding whether or not it should be used. In this case

it might be assumed that there is a certain probability p that a person

will get the disease if not vaccinated, and a probability r that the per-

son will get it if he or she is vaccinated. If we have some knowledge

of the approximate value of p, we are then led to construct an experi-

ment to decide whether r is greater than p, equal to p, or less than p.

The ¬rst case would be interpreted to mean that the vaccine actually

tends to produce the disease, the second that it has no e¬ect, and the

third that it prevents the disease; so that we can make three kinds of

errors. We could recommend acceptance when it is actually harmful,

we could recommend acceptance when it has no e¬ect, or ¬nally we

could reject it when it actually is e¬ective. The ¬rst and third might

result in the loss of lives, the second in the loss of time and money of

those administering the test. Here it would certainly be important that

the probability of the ¬rst and third kinds of errors be made small. To

see how it is possible to make the probability of both errors small, we

return to the case of Smith and Jones.

Suppose that, instead of demanding that Smith make at least eight

correct identi¬cations out of ten trials, we insist that Smith make at

least 60 correct identi¬cations out of 100 trials. (The glasses must now

be very small.) Then, if p = 1 , the probability that Jones wins the

2

bet is about .98; so that we are extremely unlikely to give the dollar to

1

Smith when in fact it should go to Jones. (If p < 2 it is even more likely

that Jones will win.) If p > 1 we can also calculate the probability that

2

Smith will win the bet. These probabilities are shown in the graph in

Figure 4.15. The dashed curve gives for comparison the corresponding

probabilities for the test requiring eight out of ten correct. Note that

3

with 100 trials, if p is 4 , the probability that Smith wins the bet is

1

nearly 1, while in the case of eight out of ten, it was only about 2 .

Thus in the case of 100 trials, it would be easy to convince both Smith

and Jones that whichever one is correct is very likely to win the bet.

Thus we see that the probability of both types of errors can be made

small at the expense of having a large number of experiments.

141

4.9. A PROBLEM OF DECISION

Figure 4.15: ™¦

Exercises

1. Assume that in the beer and ale experiment Jones agrees to pay

Smith if Smith gets at least nine out of ten correct.

(a) What is the probability of Jones paying Smith even though

Smith cannot distinguish beer and ale, and guesses?

[Ans. .011.]

(b) Suppose that Smith can distinguish with probability .9. What

is the probability of not collecting from Jones?

[Ans. .264.]

2. Suppose that in the beer and ale experiment Jones wishes the

probability to be less than .1 that Smith will be paid if, in fact,

Smith guesses. How many of ten trials must Jones insist that

Smith get correct to achieve this?

3. In the analysis of the beer and ale experiment, we assume that

the various trials were independent. Discuss several ways that

error can enter, because of the nonindependence of the trials, and

how this error can be eliminated. (For example, the glasses in

which the beer and ale were served might be distinguishable.)

4. Consider the following two procedures for testing Smith™s ability

to distinguish beer from ale.

142 CHAPTER 4. PROBABILITY THEORY

(a) Four glasses are given at each trial, three containing beer and

one ale, and Smith is asked to pick out the one containing

ale. This procedure is repeated ten times. Smith must guess

correctly seven or more times. Find the probability that

Smith wins by guessing.

[Ans. .003.]

(b) Ten glasses are given to Smith, and Smith is told that ¬ve

contain beer and ¬ve ale, and asked to name the ¬ve that

contain ale. Smith must choose all ¬ve correctly. Find the

probability that Smith wins by guessing.

[Ans. .004.]

(c) Is there any reason to prefer one of these two tests over the

other?

5. A testing service claims to have a method for predicting the order

in which a group of freshmen will ¬nish in their scholastic record

at the end of college. The college agrees to try the method on

a group of ¬ve students, and says that it will adopt the method

if, for these ¬ve students, the prediction is either exactly correct

or can be changed into the correct order by interchanging one

pair of adjacent students in the predicted order. If the method is

equivalent to simply guessing, what is the probability that it will

be accepted?

1

[Ans. .]

24

1

6. The standard treatment for a certain disease leads to a cure in 4

of the cases. It is claimed that a new treatment will result in a

cure in 3 of the cases. The new treatment is to be tested on ten

4

people having the disease. If seven or more are cured, the new

treatment will be adopted. If three or fewer people are cured, the

treatment will not be considered further. If the number cured

is four, ¬ve, or six, the results will be called inconclusive, and

a further study will be made. Find the probabilities for each of

these three alternatives under the assumption ¬rst, that the new

treatment has the same e¬ectiveness as the old, and second, under

the assumption that the claim made for the treatmnent is correct.

143

4.10. THE LAW OF LARGE NUMBERS

7. Three students debate the intelligence of Springer spaniels. One

claims that Springers are mostly (say 90 per cent of them) in-

telligent. A second claims that very few (say 10 per cent) are

intelligent, while a third one claims that a Springer is just as

likely to be intelligent as not. They administer an intelligence

test to ten Springers, classifying them as intelligent or not. They

agree that the ¬rst student wins the bet if eight or more are in-

telligent, the second if two or fewer, the third in all other cases.

For each student, calculate the probability that he or she wins

the bet, if he or she is right.

[Ans. .930, .930, .890.]

8. Ten students take a test with ten problems. Each student on each

1

question has probability 2 of being right, if he or she does not

cheat. The instructor determines the number of students who get

each problem correct. If instructor ¬nds on four or more problems

there are fewer than three or more than seven correct, he or she

considers this convincing evidence of communication between the

students. Give a justi¬cation for the procedure. [Hint: The table

in Figure 4.14 must be used twice, once for the probability of fewer

than three or more than seven correct answers on a given problem,

and the second time to ¬nd the probability of this happening on

four or more problems.]

4.10 The law of large numbers

In this section we shall study some further properties of the indepen-

dent trials process with two outcomes. In Section 4.8 we saw that the

probability for x successes in n trials is given by

n x n’x

f (n, x; p) = pq .

x

In Figure 4.16 we show these probabilities graphically for n = 8 and

p = 3 . In Figure 4.17 we have done similarly for the case of n = 7 and

4

3

p = 4.

We see in the ¬rst case that the values increase up to a maximum

value at x = 6 and then decrease. In the second case the values increase

up to a maximum value at x = 5, have the same value for x = 6, and

144 CHAPTER 4. PROBABILITY THEORY

Figure 4.16: ™¦

Figure 4.17: ™¦

145

4.10. THE LAW OF LARGE NUMBERS

then decrease. These two cases are typical of what can happen in

general.

Consider the ratio of the probability of x + 1 successes in n trials to

the probability of x successes in n trials, which is

n

px+1 q n’x’1 n’x p

x+1

·.

=

n x+1 q

px q n’x

x

This ratio will be greater than one as long as (n ’ x)p > (x + 1)q

or as long as x < np ’ q. If np ’ q is not an integer, the values

n x n’x

pq increase up to a maximum value, which occurs at the ¬rst

x

integer greater than np ’ q, and then decrease. In case np ’ q is an

integer, the values n px q n’x increase up to x = np ’ q, are the same

x

for x = np ’ q and x = np ’ q + 1, and then decrease.

Thus we see that, in general, values near np will occur with the

largest probability. It is not true that one particular value near np is

highly likely to occur, but only that it is relatively more likely than

a value further from np. For example, in 100 throws of a coin, np =

1

100 · 2 = 50. The probability of exactly 50 heads is approximately .08.

The probability of exactly 30 is approximately .00002.

More information is obtained by studying the probability of a given

deviation of the proportion of successes x/n from the number p; that

is, by studying for > 0,

x

’ p| < ].

Pr[|

n

For any ¬xed n, p, and , the latter probability can be found by

adding all the values of f (n, x; p) for values of x for which the inequality

p ’ < x/n < p + is true. In Figure 4.18 we have given these

probabilities for the case p = .3 with various values for and n. In the

¬rst column we have the case = .1. We observe that as n increases,

the probability that the fraction of successes deviates from .3 by less

than .1 tends to the value 1. In fact to four decimal places the answer

is 1 after n = 400. In column two we have the same probabilities for

the smaller value of = .05. Again the probabilities are tending to 1

but not so fast. In the third column we have given these probabilities

for the case = .02. We see now that even after 1000 trials there is still

a reasonable chance that the fraction x/n is not within .02 of the value

of p = .3. It is natural to ask if we can expect these probabilities also

to tend to 1 if we increase n su¬ciently. The answer is yes and this is

146 CHAPTER 4. PROBABILITY THEORY

Figure 4.18: ™¦

147

4.10. THE LAW OF LARGE NUMBERS

Figure 4.19: ™¦

assured by one of the fundamental theorems of probability called the

law of large numbers. This theorem asserts that, for any > 0,

x

’ p| < ]

Pr[|

n

tends to 1 as n increases inde¬nitely.

It is important to understand what this theorem says and what it

does not say. Let us illustrate its meaning in the case of coin tossing.

We are going to toss a coin n times and we want the probability to be

very high, say greater than .99, that the fraction of heads which turn

up will be very close, say within .00l of the value .5. The law of large

numbers assures us that we can have this if we simply choose n large

enough. The theorem itself gives us no information about how large n

must be. Let us however consider this question.

To say that the fraction of the times success results is near p is the

same as saying that the actual number of successes x does not deviate

too much from the expected number np. To see the kind of deviations

which might be expected we can study the value of Pr[|x ’ np| ≥ d]. A

table of these values for p = .3 and various values of n and d are given

in Figure 4.19. Let us ask how large d must be before a deviation as

large as d could be considered surprising. For example, let us see for

each n the value of d which makes Pr[|x ’ np| ≥ d] about .04. From

the table, we see that d should be 7 for n = 50, 9 for n = 80, 10 for

n = 100, etc. To see deviations which might be considered more typical

we look for the values of d which make Pr[|x ’ np| ≥ d] approximately

1

. Again from the table, we see that d should be 3 or 4 for n = 50, 4

3

or 5 for n = 80, 5 for n = 100, etc. The answers to these two questions

148 CHAPTER 4. PROBABILITY THEORY

are given in the last two columns of the table. An examination of these

numbers shows us that deviations which we would consider surprising

√

are approximately √ while those which are more typical are about

n

one-half as large or n/2.

√

This suggests that n, or a suitable multiple of it, might be taken

as a unit of measurement for deviations. Of course, we would also have

to study how Pr[|x ’ np| ≥ d] depends on p. When this is done, one

√

¬nds that npq is a natural unit; it is called a standard deviation. It

can be shown that for large n the following approximations hold.

√

Pr[|x ’ np| ≥ npq] ≈ .3174

√

Pr[|x ’ np| ≥ 2 npq] ≈ .0455

√

Pr[|x ’ np| ≥ 3 npq] ≈ .0027

That is, a deviation from the expected value of one standard devi-

ation is rather typical, while a deviation of as much as two standard

deviations is quite surprising and three very surprising. For values of p

√ 1

not too near 0 or 1, the value of pq is approximately 2 . Thus these

approximations are consistent with the results we observed from our

table.

√ √

For large n, Pr[x ’ np ≥ k npq] or Pr[x ’ np ¤ ’k npq] can be

shown to be approximately the same. Hence these probabilities can be

1

estimated for k = 1, 2, 3 by taking 2 the values given above.

Example 4.19 In throwing an ordinary coin 10,000 times, the ex-

pected number of heads is 5000, and the standard deviation for the

11

number of heads is 10, 000( 2 )( 2 ) = 50. Thus the probability that

the number of heads which turn up deviates from 5000 by as much as

one standard deviation, or 50, is approximately .317. The probability

of a deviation of as much as two standard deviations, or 100, is ap-

proximately .046. The probability of a deviation of as much as three

™¦

standard deviations, or 150, is approximately .003.

Example 4.20 Assume that in a certain large city, 900 people are

chosen at random and asked if they favor a certain proposal. Of the

900 asked, 550 say they favor the proposal and 350 are opposed. If, in

fact, the people in the city are equally divided on the issue, would it be

unlikely that such a large majority would be obtained in a sample of 900

of the citizens? If the people were equally divided, we would assume

that the 900 people asked would form an independent trials process

149

4.10. THE LAW OF LARGE NUMBERS

1 1

with probability 2 for a “yes” answer and 2 for a “no” answer. Then

the standard deviation for the number of “yes” answers in 900 trials is

900( 2 )( 1 ) = 15. Then it would be very unlikely that we would obtain

1

2

a deviation of more than 45 from the expected number of 450. The

fact that the deviation in the sample from the expected number was

100, then, is evidence that the hypothesis that the voters were equally

divided is incorrect. The assumption that the true proportion is any

value less than 1 would also lead to the fact that a number as large

2

as 550 favoring in a sample of 900 is very unlikely. Thus we are led to

1

suspect that the true proportion is greater than 2 . On the other hand,

if the number who favored the proposal in the sample of 900 were 465,

we would have only a deviation of one standard deviation, under the

assumption of an equal division of opinion. Since such a deviation is

not unlikely, we could not rule out this possibility on the evidence of

™¦

the sample.

Example 4.21 A certain Ivy League college would like to admit 800

students in their freshman class. Experience has shown that if they

admit 1250 students they will have acceptances from approximately

800. If they admit as many as 50 too many students they will have

to provide additional dormitory space. Let us ¬nd the probability that

this will happen assuming that the acceptances of the students can be

considered to be an independent trials process. We take as our estimate

800

for the probability of an acceptance p = 1250 . Then the expected num-

ber of acceptances is 800 and the standard deviation for the number of

√

acceptances is 1250 · .64 · .36 ≈ 17. The probability that the number

accepted is three standard deviations or 51 from the mean is approx-

imately .0027. This probability takes into account a deviation above

the mean or below the mean. Since in this case we are only interested

in a deviation above the mean, the probability we desire is half of this

or approximately .0013. Thus we see that it is highly unlikely that the

college will have to have new dormitory space under the assumptions

™¦

we have made.

We ¬nish this discussion of the law of large numbers with some ¬nal

remarks about the interpretation of this important theorem.

Of course no matter how large n is we cannot prevent the coin from

coming up heads every time. If this were the case we would observe a

fraction of heads equal to 1. However, this is not inconsistent with the

1

theorem, since the probability of this happening is ( 2 )n which tends to

150 CHAPTER 4. PROBABILITY THEORY

0 as n increases. Thus a fraction of 1 is always possible, but becomes

increasingly unlikely.

The law of large numbers is often misinterpreted in the following

manner. Suppose that we plan to toss the coin 1000 times and after

500 tosses we have already obtained 400 heads. Then we must obtain

less than one-half heads in the remaining 500 tosses to have the fraction

come out near 1 . It is tempting to argue that the coin therefore owes

2

us some tails and it is more likely that tails will occur in the last 500

tosses. Of course this is nonsense, since the coin has no memory. The

point is that something very unlikely has already happened in the ¬rst