(For a very slightly di¬erent view of the matter see Exercise E.2.)

Remark 3: We proved Theorem 13.2.4 under the assumption the linear map

± : Rn ’ Rn given by

±t = Df (x0 , y0 )(0, t)

is invertible.

This condition gives unnecessary prominence to the last n coordinates.

To get round this, we introduce a new de¬nition.

De¬nition 13.2.7. We say that a linear map β : Rn+m ’ Rn has full rank

if β is surjective.

The next, very easy, lemma gives some context.

346 A COMPANION TO ANALYSIS

Lemma 13.2.8. Suppose that a linear map β : Rn+m ’ Rn has matrix B

with respect to standard coordinates. (We use column vectors.) Then the

following conditions are equivalent.

(i) β has full rank.

(ii) B has n linearly independent columns.

(iii) By permuting the coordinate axes, we can ensure that the last n

columns of B are linearly independent.

(iv) By permuting the coordinate axes, we can ensure that the linear map

± : Rn ’ Rn given by

±t = β(0, t)

is invertible.

Proof. To see that (i) implies (ii), observe that

rank β = dim βRn+m = dim span{b1 , b2 , , . . . , bn+m }

where bj is the jth column of B. If β has full rank then, since any spanning

set contains a basis, n of the bj must be linearly independent.

It is clear that (ii) implies (iii). To see that (iii) implies (iv), observe that,

if ± has matrix A with respect to standard coordinates after our permutation,

then A is an n — n matrix whose columns are the ¬rst n columns of B and

so linearly independent. Thus A is invertible and so ± is.

To see that (iv) implies (i), observe that

{b1 , b2 , , . . . , bn+m } ⊇ {a1 , a2 , , . . . , an }

where bk is the kth column of A, the matrix of ± with respect to standard

coordinates. Thus

Rn ⊇ span{b1 , b2 , , . . . , bn+m } ⊇ span{a1 , a2 , , . . . , an } = Rn ,

so βRn+m = Rn and β has full rank.

The appropriate modi¬cation of Theorem 13.2.4 is now obvious.

Theorem 13.2.9. Consider a function f : Rm+n ’ Rn which is di¬eren-

tiable on an open set U . Suppose further that Df is continuous at every point

of U , that w0 ∈ U and that Df (w0 ) has full rank. Then we can permute the

coordinate axes in such a way that, if we write w0 = (x0 , y0 ) ∈ Rm — Rn ,

we can ¬nd an open set B1 in Rm , with x0 ∈ B1 , and an open set B2 in

Rn with f (w0 ) ∈ B2 such that, if z ∈ B2 , there exists a di¬erentiable map

gz : B1 ’ Rm with (x, gz (x)) ∈ U and

f (x, gz (x)) = z

for all x ∈ B1 .

347

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Remark 4: In Appendix C, I emphasised the importance of a large stock of

examples, but I talked mainly about examples of badly behaved functions.

It is also true that mathematicians ¬nd it hard to lay their hands on a

su¬ciently diverse collection of well behaved functions. In this book we

started with the polynomials. To these we added the function x ’ 1/x

and got the rational functions x ’ P (x)/Q(x) with P and Q polynomials.

We then added power series and solutions of di¬erential equations. The

inverse function theorem gives us a new collection of interesting functions and

the implicit function theorem greatly extends this collection. The implicit

function theorem is particularly important in giving us interesting examples

of well behaved functions f : Rn ’ R when n ≥ 2.

Lagrange multipliers ™

13.3

Suppose that t : R4 ’ R and f : R4 ’ R2 are well behaved functions and we

wish to maximise t(x) subject to the condition f (x) = h.

This will involve a fair amount of calculation, so it seems reasonable to

simplify our task as much as possible. One obvious simpli¬cation is to trans-

late so that h = (0, 0) and f (0, 0, 0, 0) = (0, 0). We now wish to investigate

whether x = (0, 0, 0, 0) maximises t(x) subject to the condition f (x) = (0, 0).

A more interesting simpli¬cation is to observe that, since we are assuming

good behaviour, Df (0) will have full rank. We can therefore ¬nd invertible

linear maps ± : R4 ’ R4 and β : R2 ’ R2 such that βDf (0)± has matrix

1000

J=

0100

with respect to the standard basis (and using the column vector representa-

tion).

If we set

F(x) = β(f (±x)), T (x) = t(±x)

then our problem becomes that of investigating whether x = (0, 0, 0, 0) max-

imises T (x), subject to the condition F(x) = (0, 0). By the implicit function

theorem, we can ¬nd well behaved u, v : R ’ R2 such that u(0, 0) = v(0, 0) =

(0, 0) and (at least if x and y are small)

F(u(z, w), v(z, w), z, w) = (0, 0).

By the chain rule Df (0) has matrix J with respect to the standard basis

(using the column vector representation) and so

u,1 (0, 0) = u,2 (0, 0) = v,1 (0, 0) = v,2 (0, 0) = 0.

348 A COMPANION TO ANALYSIS

Before proceeding further the reader should do the next exercise.

Exercise 13.3.1. Explain why our conditions on F tell us that near (0, 0, 0, 0)

F(x, y, z, w) ≈ (x, y).

Suppose that, in fact, F(x, y, z, w) = (x, y) exactly. Find u and v. What

conditions on T ensure that x = (0, 0, 0, 0) maximises T (x) subject to the

condition F(x) = (0, 0)?

Since (at least locally) F(u(z, w), v(z, w), z, w) = (0, 0), our problem re-

duces to studying whether (z, w) = (0, 0) maximises „ (z, w) = T (u(z, w), v(z, w), z, w).

Since every maximum is a stationary point we ¬rst ask if (z, w) = (0, 0) is a

stationary point, that is, if

‚„ ‚„

(0, 0) = (0, 0) = 0.

‚z ‚w

By the chain rule,

‚„ ‚u ‚v

(z, w) = T,1 (u(z, w), v(z, w), z, w) (z, w) + T,2 (u(z, w), v(z, w), z, w) (z, w)

‚z ‚z ‚z

+ T,3 (u(z, w), v(z, w), z, w),

and so

‚„

(0, 0) = T,3 (0, 0, 0, 0).

‚z

A similar calculation gives

‚„

(0, 0) = T,4 (0, 0, 0, 0),

‚w

so (0, 0) is a stationary point of „ if and only if

T,3 (0, 0, 0, 0) = T,4 (0, 0, 0, 0) = 0.

We have found a necessary condition for (0, 0, 0, 0) to maximise T (x) subject

to F(x) = 0, and so for (0, 0, 0, 0) to maximise t(x) subject to f (x) = 0.

In order to make this condition usable, we must translate it back into the

terms of our original problem and this is most easily done by expressing it

in coordinate-free notation. Observe that if we write

l(x) = T (x) ’ T,1 (0)F1 (x) ’ T,2 (0)F2 (x)

349

Please send corrections however trivial to twk@dpmms.cam.ac.uk

then

l,1 (0) = T,1 (0) ’ T,1 (0)F1,1 (x) ’ T,2 (0)F2,1 (0) = T,1 (0) ’ T,1 (0) = 0,

l,2 (0) = T,2 (0) ’ T,1 (0)F1,2 (x) ’ T,2 (0)F2,2 (0) = T,2 (0) ’ T,2 (0) = 0,

l,3 (0) = T,3 (0) ’ T,1 (0)F1,3 (x) ’ T,2 (0)F2,3 (0) = T,3 (0),

l,4 (0) = T,4 (0) ’ T,1 (0)F1,4 (x) ’ T,2 (0)F2,4 (0) = T,4 (0),

so, if (0, 0, 0, 0) is a maximum,

D(T ’ T,1 (0)F1 ’ T,2 (0)F2 )(0) = 0.

Thus, if (0, 0, 0, 0) gives a maximum and θ : R2 ’ R is the linear map

given by θ(x, y) = T,1 (0)x + T,2 (0)y, we have

D(T ’ θF)(0) = 0.

Using the de¬nitions of T and F, this gives

D(t± ’ θβf (±))(0) = 0,

so by the chain rule

(Dt(0) ’ θβDf (0))± = 0.

Since ± is invertible, this is equivalent to saying

(Dt(0) ’ θβDf (0)) = 0,

and, since β is invertible, this, in turn, is equivalent to saying that there

exists a linear map φ : R2 ’ R such that

(Dt(0) ’ φDf (0)) = 0,

so, using the chain rule once again,

D(t ’ φf )(0) = 0.

Writing φ(x, y) = »1 x + »2 y, we see that that x = (0, 0, 0, 0) maximises T (x),

subject to the condition F(x) = (0, 0), only if we can ¬nd »1 and »2 such

that t ’ »1 F1 ’ »2 F2 has (0, 0, 0, 0) as a stationary point.

The argument generalises easily to any number of dimensions.

Exercise 13.3.2. Prove Lemma 13.3.3.

350 A COMPANION TO ANALYSIS

Lemma 13.3.3. (Lagrangian necessity.) Consider functions f : Rm+n ’

Rn and t : Rm+n ’ R which are di¬erentiable on an open set U . Suppose

further that Df is continuous and of full rank at every point of U . If h ∈ R n ,

then, if z ∈ U is such that

(i) f (z) = h, and

(ii) t(z) ≥ t(x) for all x ∈ U such that f (x) = h,

then it follows that there exists a » ∈ Rn such that the function t ’ » · f :

Rm+n ’ R has z as a stationary point.

Exercise 13.3.4. Suppose that f : R2 ’ R and t : R2 ’ R are very well

behaved. Convince yourself that, in this case, Lemma 13.3.3 reads as follows.

A man takes the path f (x, y) = h over a hill of height t(x, y). He reaches the

highest point of the path at a point (x0 , y0 ) where the path and the contour line

t(x, y) = t(x0 , y0 ) have a common tangent. Generalise to higher dimensions.

Lemma 13.3.3 gives us a recipe for ¬nding the maximum of t(x), subject to

the conditions x ∈ U and f (x) = h, when t : Rn+m ’ R and f : Rn+m ’ Rn

are well behaved functions and U is open.

(1) Set

L(», x) = t(x) ’ » · f (x)

with » ∈ Rn . (L is called the Lagrangian.)

(2) For each ¬xed », ¬nd the set E(») of stationary points of L(», ) lying

within U . (The previous sentence uses the notational convention described

in the paragraph labled Abuse of Language on Page 4215 .)

(3) Now vary » and ¬nd all »— and x— ∈ E(»— ) such that f (x— ) = h. The

maximum, if it exists, will lie among these x— .

Put like this, the whole procedure has the same unreal quality that in-

structions on how to ride a bicycle would have for someone who had only

seen a photograph of a bicycle. I suspect that the only way to learn how to

use Lagrange™s method is to use it. I expect that most of my readers will

indeed be familiar with Lagrange™s method and, in any case this is a book on

the theory of analysis. However, I include one simple example of the method

in practice.

Example 13.3.5. Find the circular cylinder of greatest volume with given

surface area. (That is, ¬nd the optimum dimensions for tin of standard

shape.)

5

This convention was thoroughly disliked by several of the readers of my manuscript

but, as one of them remarked, ˜No really satisfactory notation exists™. The reader should

feel free to use her own notation. In particular she may prefer to use a placeholder and

write L(», ·)

351

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Calculation. Let the height of the cylinder be h and the radius be r. We

wish to maximise the volume

V (r, h) = πr 2 h,

subject to keeping the surface area

A(r, h) = 2πr 2 + 2πrh

¬xed, with A(r, h) = A [r, h > 0], say.

The instructions for step (1) tell us to form the Lagrangian

L(», r, h) = V (r, h) ’ »A(r, h) = πr 2 h ’ »(2πr 2 + 2πrh).

The instructions for step (2) tell us to seek stationary values for L(», , )

when » is ¬xed. We thus seek to solve

‚L ‚L

(», r, h) = 0, (», r, h) = 0,

‚r ‚h

that is

2πrh ’ »(4πr ’ 2πh) = 0, πr 2 ’ 2»πr = 0,

or, simplifying,

rh ’ »(2r + h) = 0, r ’ 2» = 0

Thus, if » is ¬xed and positive, L(», , ) has a unique stationary point given

by r = r» = 2», h = h» = 4». (If » ¤ 0, there are no stationary points

consistent with our restrictions r > 0, h > 0.)

We now proceed to step (3) which requires us to ¬nd » such that

A = A(r» , h» )

that is

A = A(2», 4») = 24π»2 .

This has unique solution with » > 0. We know, on general grounds (see

Exercise 13.3.6), that a maximum exists so, since Lagrange™s method must

produce the maximum point, and produces only one point in this case, this

point must be the maximum.

The neatest way of writing our solution (and one that we could have

obtained directly by eliminating » from equation ) is that the cylinder of

maximum volume with given surface area has height twice its radius.

352 A COMPANION TO ANALYSIS

It must be admitted that the standard tin does not follow our prescription

very closely. (Presumably the cost of materials is relatively small compared

to other costs. We must also remember that near a maximum small changes

produce very small e¬ects so the penalties for deviating are not high.)

Exercise 13.3.6. We work with the notation and hypotheses introduced in

the calculations in Example 13.3.5. Choose r0 and h0 such that r0 , h0 > 0

and A(r0 , h0 ) = A. Show that there exists an R > 1 such that, if r > R or

R’1 > r > 0 and A(r, h) = A, then V (r, h) ¤ V (r0 , h0 )/2. Show that

K = {(r, h) : A(r, h) = A, R ≥ r ≥ R’1 , h > 0}

is a closed bounded set in R2 . Deduce that V (r, h) attains a maximum on K

and conclude that V (r, h) attains a maximum on

{(r, h) : A(r, h) = A, r > 0, h > 0}.

Exercise 13.3.7. Suppose that f : R2 ’ R and t : R2 ’ R are given by

f (x, y) = y and t(x, y) = y 2 ’ x2 . We seek the maximum value of t(x, y)

subject to f (x, y) = a. Show that L(», ) = t ’ »f is stationary at (0, »/2),

and that the only value of » which gives f (0, »/2) = a is » = 2a. [We again

use the notation described on Page 421.]

(i) Show that (0, a) does indeed maximise t(x, y) subject to f (x, y) = a.

(ii) Show, however, that L(2a, ) = t ’ 2af does not have a maximum at

(0, a).

(iii) Draw a diagram in the manner of Exercise 13.3.4 to illustrate what

is happening.

Exercise 13.3.8. Suppose that f : R2 ’ R and t : R2 ’ R are given by

f (x, y) = x ’ ±y 2 and t(x, y) = (x ’ 1)2 + y 2 . We seek maxima and minima

of t subject to f (x, y) = 0. Show that L(», ) = t ’ »f is stationary at

(1 + »/2, 0), and that the only value of » which gives f (0, 0) = 0 is » = ’2.

(i) Show that, if ± > 1/2, (0, 0) minimises t(x, y) subject to f (x, y) = 0.

Show, moreover, L(’2, ) = t ’ 2f has a strict minimum at 0.

(ii) Show that, if ± = 1/2, (0, 0) minimises t(x, y) subject to f (x, y) = 0.

Show that L(’2, ) has a minimum at 0 but this is not a strict minimum.

(iii) Show that, if ± < 1/2, (0, 0) maximises t(x, y) subject to f (x, y) = 0.

Show, however, that L(’2, ) does not have a maximum at 0.

(iv) Draw a diagram in the manner of Exercise 13.3.4 to illustrate (as far

as you can, do not overdo it) what is happening.

The following remark is entirely trivial but extremely useful.

353

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Lemma 13.3.9. (Lagrangian su¬ciency.) Consider a set X and func-

tions f : X ’ Rn and t : X ’ R. Set

L(», x) = t(x) ’ » · f (x)

for all » ∈ Rn and x ∈ X.

Suppose h ∈ Rn , »— ∈ Rn , and x— ∈ X are such that f (x— ) = h and

L(»— , x— ) ≥ L(»— , x)

for all x ∈ X. Then

t(x— ) ≥ t(x)

for all x ∈ X such that f (x) = h.

Proof. The proof is shorter than the statement. Observe that, under the

given hypotheses,

t(x— ) = t(x— ) ’ »— · f (x— ) + »— · h = L(»— , x— ) + »— · h

≥ L(»— , x) + »— · h = t(x) ’ »— · f (x) + »— · h = t(x)

for all x ∈ X such that f (x) = h, as required.

It is worth noting that Lemma 13.3.9 (our Lagrangian su¬ciency condition)

does not require f or t to be well behaved or make any demands on the un-

derlying space X. This is particularly useful since, if the reader notes where

Lagrange™s method is used, she will ¬nd that it is often used in circumstances

much more general than those envisaged in Lemma 13.3.3 (our Lagrangian

necessity condition).

However, the very simple examples given in Exercises 13.3.7 and 13.3.8

show that there is a very big gap between our Lagrangian necessity and suf-

¬ciency conditions even when we restrict ourselves to well behaved functions

on ¬nite dimensional spaces. In the ¬rst three chapters of his elegant text

Optimization Under Constraints [48] Whittle considers how this gap can be

closed. It turns out that in certain very important practical cases (in partic-

ular those occurring in linear programming) the su¬cient condition is also

necessary but that, from the point of view of the present book, these cases

have very special features.

In general, the Lagrangian method is very e¬ective in suggesting the

correct answer but when that answer has been found it is often easier to

prove correctness by some other method (compare the treatment of geodesics

in Section 10.5 where we found Theorem 10.5.11 by one method but proved

it by another).

354 A COMPANION TO ANALYSIS

The alternative strategy, once the Lagrangian method has produced a

possible solution is to claim that ˜it is obvious from physical considerations

that this is indeed the correct solution™. This works well provided at least

one of the following conditions apply:-

(1) you are answering an examination question, or

(2) you have genuine physical insight, or

(3) you are reasonably lucky.

It is, of course, characteristic of bad luck that it strikes at the most inconve-

nient time.

Chapter 14

Completion

14.1 What is the correct question?

We cannot do interesting analysis on the rationals but we can on the larger

space of real numbers. On the whole, we cannot do interesting analysis on

metric spaces which are not complete. Given such an incomplete metric

space, can we ¬nd a larger complete metric space which contains it? Our

¬rst version of this question might run as follows.

Question A: If (X, d) is a metric space, can we ¬nd a complete metric space

(Z, δ) such that Z ⊇ X and d(u, v) = δ(u, v) for all u, v ∈ X?

Most mathematicians prefer to ask this question in a slightly di¬erent

way.

Question A : If (X, d) is a metric space, can we ¬nd a complete metric space

˜ ˜

(Y, d) and a map θ : X ’ Y such that d(θu, θv) = d(u, v)?

Question A asks us to build a complete metric space containing (X, d).

Question A asks us to build a complete metric space containing a perfect

copy of (X, d). Since mathematicians cannot distinguish between the original

space (X, d) and a perfect copy, they consider the two questions to be equiv-

alent. However, both from a philosophical and a technical point of view, it

is easier to handle Question A .

Exercise 14.1.1. Convince yourself either that Question A is equivalent to

Question A or that Question A is more appropriate than Question A.

For the moment, we stick to Question A. A little thought shows that it

does not have a unique answer.

Exercise 14.1.2. Consider the open interval X = (0, 1) on R with the usual

metric d.

(i) Show that (X, d) is not complete.

355

356 A COMPANION TO ANALYSIS

(ii) Show that, if we take Z = [0, 1] with the usual metric δ, then (Z, δ)

answers Question A.

(iii) Show that, if we take Y = R with the usual metric δ, then (Z, δ)

answers Question A.