If we choose nj ’ ∞ and mj ’ ∞ in such a way that m2 /nj ’ » as

j

j ’ ∞, what happens to A(mj , nj )? Can you choose nj ’ ∞ and mj ’ ∞

so that A(mj , nj ) ’ ∞ as j ’ ∞? Can you choose nj ’ ∞ and mj ’ ∞

so that A(mj , nj ) is bounded but does not converge? Can you choose nj ’ ∞

and mj ’ ∞ so that A(mj , nj ) ’ π?

Exercise 9.5.15. Explain in words and without using calculations why, if

we ¬x n, A(m, n) ’ ∞ as m ’ ∞. (Unless you can give a simple ge-

ometric account of what is going on, you have not understood Schwarz™s

example.) Deduce directly that we can choose nj ’ ∞ and mj ’ ∞ such

that A(mj , nj ) ’ ∞ as j ’ ∞. [Thus, if we simply want to show that the

naive approach fails, we do not need the calculations of the previous exercise.

However, if we want more information (for example, necessary and su¬cient

conditions for A(mj , nj ) ’ 2π) then we must do the calculations.]

Once the reader has grasped the point of Exercise 9.5.15, she may feel

that it is obvious what is wrong. However, the problem is not that we cannot

see what the area of a cylinder ought to be, but that we cannot think of a

simple de¬nition of area which will apply to general surfaces in the same

way that our de¬nition of length applied to general curves. Up to now, all

attempts to produce a de¬nition of area to parallel our de¬nition of length

have failed12 .

If a surface is su¬ciently well behaved, it is more or less clear how to frame

a notion of approximation by polyhedra which excludes Schwarz™s example.

But, if a surface is su¬ciently well behaved, we can produce a rigorous de¬ni-

tion of its area by tidying up the standard mathematical methods de¬nition.

If we are going to do this for the area of a surface and its higher dimensional

analogues, it seems a waste of time to have a special theory for length. Life,

as the saying goes, is too short to stu¬ an olive. The next exercise gives the

standard development.

Exercise 9.5.16. (Standard treatment of line integrals.) We deal only

with curves γ : [a, b] ’ Rm which are continuously di¬erentiable (so that,

writing

γ(t) = (γ1 (t), γ2 (t), . . . , γm (t)),

we know that γj : [a, b] ’ R exists and is continuous). If f : Rm ’ R is

continuous, we de¬ne

b

f (γ1 (t), γ2 (t), . . . , γm (t))(γ1 (t)2 + γ2 (t)2 + · · · + γm (t)2 )1/2 dt.

f (x) ds =

a

γ

12

There are other ways of tackling this problem. Once again I refer the reader to the

circle of ideas associated with the name of Hausdor¬.

232 A COMPANION TO ANALYSIS

Suppose that θ : [c, d] ’ [a, b] is a surjective function with continuous

derivative. Explain why „ = γ —¦ θ is a continuously di¬erentiable function

from [c, d] to Rm . Show that, if θ is injective, then

f (x) ds = f (x) ds,

„ γ

but give an example to show that, if θ is not injective, this may not hold.

We de¬ne the length of γ to be 1 ds.

γ

Chapter 10

Metric spaces

Sphere packing ™

10.1

(In this section and the next we are much more interested in ideas than

rigour. We shall use methods and results which go well beyond the scope of

this book.)

Human beings prefer order to disorder. Asked to pack a crate of oranges,

we do not throw them in at random, but try to pack them in a regular

pattern. But choosing patterns requires insight and sometimes we have no

insight. Consider the problem of packing n dimensional balls in a very large

box. (We shall take our balls and boxes to be open, but it should be clear

that such details do not matter.)

As an example of a regular packing, let the balls have radius 1/2 and let

us adopt a cubical packing so that the centre of each ball is one of integer

points Zn . Is this reasonably e¬cient or not? To answer this question it is

helpful to know the volume Vn (r) of an n dimensional ball of radius r.

Lemma 10.1.1. (i) If we write Vn = Vn (1), then Vn (r) = Vn rn .

(ii) If a > 0 and I[0,a] : [0, ∞) ’ R is de¬ned by I[0,a] (t) = 1 if t ∈ [0, a],

I[0,a] (t) = 0, otherwise, then

∞

I[0,a] ( r ) dV = nVn I[0,a] (r)rn’1 dr.

Rn 0

k

(iii) If f : [0, ∞) ’ R is given by f = »j I[0,aj ] , then

j=1

∞

f (r)rn’1 dr.

f ( r ) dV = nVn

Rn 0

(iv) If f : [0, ∞) ’ R is such that f (r)r n+1 ’ 0 as r ’ ∞, then

∞

f (r)rn’1 dr.

f ( r ) dV = nVn

Rn 0

233

234 A COMPANION TO ANALYSIS

(v) Taking f (r) = exp(’r 2 /2) in (ii), we obtain

∞

n/2

rn’1 exp(’r2 /2) dr,

(2π) = nVn

0

and so

πn n!22n π n’1

V2n = , V2n’1 = .

n! (2n)!

Sketch proof. (i) Use similarity.

(ii) This is a restatement of (i).

(iii) Use linearity of the integral.

(iv) Use an approximation argument.

(v) Using repeated integration,

∞ ∞ ∞

exp(’(x2 + x2 + · · · + x2 )/2) dx1 dx2 . . . dxn

···

f ( r ) dV = 1 2 n

Rn ’∞ ’∞ ’∞

∞ ∞ ∞

exp(’x2 /2) dx1 exp(’x2 /2) dx2 · · · exp(’x2 /2) dxn = (2π)n/2 .

= 1 2 n

’∞ ’∞ ’∞

On the other hand, integration by parts gives

∞ ∞

n’1 2

rn’3 exp(’r2 /2) dr,

exp(’r /2) dr = (n ’ 2)

r

0 0

so

∞

n/2

rn’1 exp(’r2 /2) dr

(2π) = nVn

0

∞

rn’3 exp(’r2 /2) dr = (2π)(n’2)/2 nVn /Vn’2

= n(n ’ 2)Vn

0

and Vn = (2π)Vn’2 /n. The stated results follow by induction.

Remark: Since we are not seeking rigour in this section, I have only sketched

a proof. However, few mathematicians would demand more proof for this

result. As I emphasise in Appendix C, we need rigour when we make general

statements which must apply to objects we have not yet even imagined. Here,

we need a result about a speci¬c property of a speci¬c object (the volume of

an n-dimension sphere), so we can have much more con¬dence in our sketched

proof.

If we use a cubical packing, the proportion of space occupied by balls is

2 Vn (think about the ball center q radius 1/2 inside the cube n (qj ’

’n

j=1

235

Please send corrections however trivial to twk@dpmms.cam.ac.uk

1/2, qj ’ 1/2)) and Lemma 10.1.1 shows that this proportion drops very

rapidly indeed as n becomes large.

What happens if we ˜just place balls wherever we can™. Suppose we are

trying to ¬ll the cube

C = {x : ’N ’ 1/2 < xj < N + 1/2 [1 ¤ j ¤ n]}

with non-intersecting balls of radius 1/2, and that we have managed to place

m such balls

Sk = {x : x ’ yk < 1/2} [1 ¤ k ¤ m].

Now consider the balls with the same centres and doubled radius

σk = {x : x ’ yk < 1} [1 ¤ k ¤ m].

A little thought shows that, if y lies inside the cube

C = {x : ’N < xj < N [1 ¤ j ¤ n]}

and does not lie in any σk , then the ball

S = {x : x ’ y < 1/2}

lies in C and does not intersect any Sk , so we may add another ball to our

collection. Such a point y will always exist if

m

Vol C > Vol σk

k=1

m m

σk , and so C \ σk = …). Thus if

(since then Vol C > Vol k=1 k=1

Vol C > mVn

we can add extra balls and so, by just ¬lling up available spaces, without

following any pattern, we can ¬nd at least Vol C /Vn disjoint balls of radius

1/2 in C. Since the volume of a ball of radius 1/2 is 2’n Vn the proportion of

1

space occupied by balls is 2’n Vol C / Vol C = 2’n (1 + 2N )’n so, for N large,

the proportion of space occupied by balls is essentially 2’n , an astounding

gain in e¬ciency over cubical packing.

What morals should we draw? The most immediate is that we do not

know what an n-dimensional ball looks like! (Here, ˜we™ refers to the writer

and most of his audience. Specialists in topics like the geometry of Banach

spaces do have substantial insight.) The second is that, when we have little

insight, trying to impose order on things may well be counter productive.

236 A COMPANION TO ANALYSIS

Shannon™s theorem ™

10.2

It costs money to send messages. In the old days, telegrams might be charged

for by the word. Nowadays, we pay for communication channels which carry

so many bits per second (so we are charged by the bit). Suppose that I can

a¬ord to send three bits of information to my partner. Then we can make

up eight words 000, 001, . . . and use them to send eight di¬erent messages.

For example, 000 could mean ˜send money™, 001 ˜come at once™, 010 ˜sell all

shares™, 011 ˜¬‚ee the country™ and so on. Unfortunately, bits are sometimes

received wrongly, so 011 could be sent and 001 received with unfortunate

consequences. We might, therefore, decide to have only two messages 000

(˜all well™) and 111 (˜¬‚ee at once™) so that, if at most one digit was received

wrongly, my partner could still take the correct action.

Exercise 10.2.1. Explain why the last statement is true.

If I can a¬ord to send 100 bits, then we can still decide to only have two

messages 0000 . . . 0 and 1111 . . . 1 but, if mistakes are rare, this is extremely

wasteful. How many di¬erent messages can we send and still be reasonably

con¬dent that we can tell which message was transmitted even when errors

occur? This section gives a simple but very useful model for the problem

and solves it.

Consider X = {0, 1}n as a vector space over F2 1 . Each element x ∈

{0, 1}n may be considered as a ˜word™ so X contains 2n possible words. In

transmitting a word over a noisy channel it may become corrupted to

x+e

where each coordinate ej of the random error e takes the value 1 with prob-

ability p and the value 0 with probability 1 ’ p, independent of the other

coordinates [0 < p < 1/2].

Exercise 10.2.2. Why do we not have to consider p > 1/2? Why is it

hopeless to consider p = 1/2?

If we choose q such that 1/2 > q > p > 0, then the law of large numbers

tells us that, provided n is large enough, it is very unlikely that more than

qn of the j are non-zero2 We may, therefore, simplify our problem to one in

which we know that at most qn of the j are non-zero.

It is natural to introduce the following notion of the distance d(x, y)

between two words (the ˜Hamming distance™).

1

This means that when we add two vectors x and y the jth component of x + y has

the form xj + yj where 0 + 0 = 1 + 1 = 0 and 1 + 0 = 0 + 1 = 1.

2

We prove this result directly in Exercise K.173.

237

Please send corrections however trivial to twk@dpmms.cam.ac.uk

De¬nition 10.2.3. If x, y ∈ X, we write

n

|xi ’ yi |.

d(x, y) =

j=1

Suppose that we only transmit the words y1 , y2 , . . . ym and that the

balls

Sk = {x : d(x, yk ) < qn} [1 ¤ k ¤ m],

do not intersect. Since there are no more than qn errors, the received message

will lie in one of the balls Sj , say, and the transmitted message will have been

yj . Thus our system allows us to communicate m distinct messages correctly

in spite of the random noise e.

This sounds impressive, but is not very useful unless m is reasonably

large. How large can m be? This is clearly a variant of our orange packing

problem. The natural way to proceed is to de¬ne the volume of a subset E

of X to be the number of points in E. Reusing the ideas of our previous

argument, we consider the balls with the same centres and doubled radius

σk = {x : d(x, yk ) < 2qn} [1 ¤ k ¤ m].

If y does not lie in any σk , then the ball

S = {x : d(x, y) < qn}

does not intersect any Sk , so we may add another ball to our collection. Such

a point y will always exist if

m

Vol X > Vol σk ,

k=1

that is, if

m

2n > Vol σk .

k=1

The only problem that faces us, is to estimate the volume of a ball in this

context.

The key turns out to be a simple but, to many people, initially surprising

result.

238 A COMPANION TO ANALYSIS

Lemma 10.2.4. Suppose 0 < » < 1/2.

(i) If 1 ¤ r ¤ »n then

n » n

¤ .

r’1 1’» r

(ii) There exists a constant A(») such that a ball of radius »n in X has

n n

volume at most A(») and at least .

[»n] [»n]

Proof. (i) Observe that

r

n n r »

n

¤

= = .

1 ’ r’1

r’1 n+1’r 1’»

r n

(ii) Thus, if r» = [»n], the greatest integer less than »n, we have

n

Volume ball radius »n =

r

r¤»n

r» ’r

» n

¤

1’» r»

r¤r»

∞ m

» n

¤

1’» r»

m=0

1’» n

= .

1 ’ 2» [»n]

Taking A(») = (1 ’ »)/(1 ’ 2»), we have the required result.

Exercise 10.2.5. (i) If 1 > » > 1/2, ¬nd a good estimate for the volume of

a ball of radius »n in X when n is large.

(ii) Lemma 10.2.4 says, in e¬ect, that the volume of an appropriate ball in

X is concentrated near its ˜surface™ (that is those points whose distance from

the centre is close to the radius). To what extent is this true for ordinary

balls in Rm when m is large?

In part (i) of Lemma 10.2.6 we recall a simple form of Stirling™s formula

(obtained in Exercise 9.2.7 (iii)). In the rest of the lemma we use it to obtain

estimates of the volume V (», n) of a ball of radius »n in X = {0, 1}n when

n is large. We use the notation

log b

loga b =

log a

where a, b > 0. (The reader probably knows that log a b is called ˜the loga-

rithm of b to base a™. She should check the relation aloga b = b.)

239

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Lemma 10.2.6. (i) loge N ! = N loge N + N + θ(N )N , where θ(N ) ’ 0 as

N ’ ∞.

(ii) If 0 < » < 1/2, then

n’1 loge V (», n) ’ ’» loge » ’ (1 ’ ») loge (1 ’ »)

as n ’ ∞.

(iii) If 0 < » < 1/2, then

n’1 log2 V (», n) ’ ’» log2 » ’ (1 ’ ») log2 (1 ’ »)

as n ’ ∞.

Proof. (i) This is Exercise 9.2.7 (iii).

(ii) In what follows we shall be replacing a real number y by an integer

m with |m ’ y| ¤ 1. Observe that, if f is well behaved, the mean value

inequality gives |f (y) ’ f (m)| ¤ sup|t’y|¤1 |f (t)|.

By Lemma 10.2.4, part (i) of the present lemma, and the remark just

made, there exist θ1 (n), θ2 (n) ’ 0, as n ’ ∞ such that

n

n’1 loge V (», n) = n’1 loge

[»n]

= n’1 loge n! ’ loge (n ’ [»n])! ’ loge [»n]!

= n’1 n loge n + n ’ (n ’ [»n]) loge (n ’ [»n]) ’ [»n] ’ [»n] loge [»n] + θ1 (n)

= n’1 n loge n + n ’ (n ’ »n) loge (n ’ »n) ’ »n ’ »n loge »n + θ2 (n)

= n’1 n loge n + n ’ (1 ’ »)n(loge (1 ’ ») + loge n) ’ »n(loge » + loge n) + θ2 (n)

= ’» loge » ’ (1 ’ ») loge (1 ’ ») + θ2 (n),

and this is the required result.

(iii) Just apply the de¬nition of log 2 .

Let us write H(0) = H(1) = 0 and

H(») = ’» log2 » ’ (1 ’ ») log2 (1 ’ »)

for 0 < » < 1.

Exercise 10.2.7. Prove the following results and then sketch the graph of

H.

(i) H(s) = H(1 ’ s) for all s ∈ [0, 1].

(ii) H : [0, 1] ’ R is continuous.

(iii) H is twice di¬erentiable on (0, 1) with H (t) < 0 for all t ∈ (0, 1).

(iv) 0 ¤ H(t) ¤ 1 for all t ∈ [0, 1]. Identify the maximum and minimum

points.

(v) H(t)/t ’ 1 as t ’ 0 through positive values.

240 A COMPANION TO ANALYSIS

Our discussion has shown that we can ¬nd 2n(1’H(2q)) code words all a

Hamming distance at least q n apart. We can immediately interpret this

result in terms of our original problem.

Lemma 10.2.8. Consider our noisy channel. If p, the probability of error

in a single bit, is less than 1/4, then, choosing p < p < 1/4, we can transmit,

with error rate as small as we please, so that information is passed at a rate

of 1 ’ H(2p ) times that of our original channel.

Shannon pushed the argument a little bit further. Let · > 0 be small

and let N be such that

N — Vol ball radius qn ≈ · Vol X.

Choose N points yj at random in X and let them be code words. There is

no reason to expect that the balls

Sk = {x : d(x, yk ) < qn} [1 ¤ k ¤ N ],

do not intersect (and excellent reasons for supposing that they will).