of the equation

f (x) = y

with x ’ w < δ1 . It follows that

B(f (w), ρ) ⊆ f (B(w, δ0 )) ⊆ f (U ).

We have shown that f (U ) is open

Theorem 13.1.13. (Inverse function theorem.) Consider a function

f : Rm ’ Rm which is di¬erentiable on an open set U . Suppose, further, that

Df is continuous at every point of U , that w ∈ U and Df (w) is invertible.

Then we can ¬nd an open set B ⊆ U with w ∈ B and an open set V such

that

(i) f |B : B ’ V is bijective,

(ii) f |’1 : V ’ B is di¬erentiable with

B

Df |’1 (f (u)) = (Df (u))’1

B

for all u ∈ B.

Proof. Suppose w ∈ U . By Lemma 13.1.11, we can ¬nd a δ0 > 0 such that

the open ball B(w, δ0 ) is a subset of U and Df (u) is invertible at every point

u ∈ B(w, δ0 ).

We now use the same argument which we used in Lemma 13.1.12. We

know that Df is continuous at w and Df (w) is invertible. Thus, by Lemma 13.1.9,

we can ¬nd a δ1 with δ0 ≥ δ1 > 0 and a ρ > 0 such that if y ’ f (w) ¤ ρ

there exists one and only one solution of the equation

f (x) = y

with x ’ w < δ1 . Set B = B(w, δ1 ) and apply Lemma 13.1.12 and

Lemma 13.1.9.

A slight strengthening of Theorem 13.1.13 is given in Exercise K.288. The

following cluster of easy exercises is intended to illuminate various aspects of

the inverse function theorem.

338 A COMPANION TO ANALYSIS

Exercise 13.1.14. Suppose U and V are open subsets of Rm and f : U ’ V ,

g : V ’ U are such that g —¦ f is the identity map on U . Show that if f is

di¬erentiable at u ∈ U and g is di¬erentiable at f (u) then (Df )(u) and

(Dg)(f (u)) are invertible and

’1

(Dg)(f (u)) = (Df )(u) .

Exercise 13.1.15. (A traditional examination question.) Let f : R ’ R be

given by f (x) = x3 . Show that f is bijective but f (0) = 0 (so the derivative

of f at 0 is not invertible).

Exercise 13.1.16. Consider the open interval (’4, 4). Find f ((’4, 4)) for

the functions f : R ’ R given by

(i) f (x) = sin x,

(ii) f (x) = sin 10’2 x,

(iii) f (x) = x2 ,

(iv) f (x) = x3 ’ x.

In each case comment brie¬‚y on the relation of your result to Lemma 13.1.12.

Exercise 13.1.17. Let

U = {(x, y) ∈ R2 : 1 < x2 + y 2 < 2}

and de¬ne f : U ’ R by

x2 ’ y 2 2xy

f (x, y) = ,2 .

(x2 + y 2 )1/2 (x + y 2 )1/2

(i) Show that U is open.

(ii) Show that f is di¬erentiable on U and that Df is continuous and

invertible at every point of U .

(iii) By using polar coordinates discover why f is de¬ned as it is. Show

that f (U ) = U but f is not injective.

Once the ideas behind the proof of the inverse function theorem are un-

derstood, it can be condensed into a very short argument. In Dieudonn´™s e

account the essential content of both this section and the next is stated and

proved in more general form in about two pages ([13] Chapter X, Theo-

rem 10.2.1). We leave it to the reader to undertake this condensation1 . The

usual approach to the inverse function theorem uses the contraction map-

ping theorem in a rather more subtle way than the approach adopted here.

I outline the alternative approach in Appendix F.

1

There is a Cambridge story about an eminent algebraic geometer who presented his

subject entirely without diagrams. However, from time to time, when things got di¬cult,

he would hide part of the blackboard, engage in some rapid but hidden chalk work, rub

out the result and continue.

339

Please send corrections however trivial to twk@dpmms.cam.ac.uk

The implicit function theorem ™

13.2

The contents of this section really belong to a ¬rst course in di¬erential ge-

ometry. However, there is an old mathematical tradition called ˜pass the

parcel™ by which lecturers assume that all the hard but necessary prelimi-

nary work has been done ˜in a previous course™2 . In accordance with this

tradition, lecturers in di¬erential geometry frequently leave the proof of the

implicit function theorem in the hands of an, often mythical, earlier lecturer

in analysis. Even when the earlier lecturer actually exists, this has the ef-

fect of ¬rst exposing the students to a proof of a result whose use they do

not understand and then making them use a result whose proof they have

forgotten.

My advice to the reader, as often in this book, is not to take this section

too seriously. It is far more important to make sure that you are con¬dent of

the meaning and proof of the inverse function theorem than that you worry

about the details of this section.

Consider the function h : R2 ’ R given by h(x, y) = x2 + y 2 . We know

that the contour (or level) line

x2 + y 2 = 1

can represented by a graph

x = (1 ’ y 2 )1/2

close to the point (x, y) = (0, 1). We also know that this representation fails

close to the point (x, y) = (1, 0) but that near that point we can use the

representation

y = (1 ’ x2 )1/2 .

Leaping rapidly to a conclusion, we obtain the following slogan

Slogan: If f : R2 ’ R behaves well in a neighbourhood of a point (x0 , y0 )

then at least one of the following two statements must be true.

(a) There exists a δ > 0 and a well behaved bijective function g :

(’δ, δ) ’ R such that g(0) = y0 and f (x + x0 , g(x)) = f (x0 , y0 ) for all

x ∈ (’δ, δ).

(b) There exists a δ > 0 and a well behaved bijective function g :

(’δ, δ) ’ R such that g(0) = x0 and f (g(y), y + y0 ) = f (x0 , y0 ) for all

y ∈ (’δ, δ).

2

Particular topics handled in this way include determinants, the Jordan normal form,

uniqueness of prime factorisation and various important inequalities.

340 A COMPANION TO ANALYSIS

Figure 13.1: Problems at a saddle

Exercise 13.2.1. Consider the contour x3 ’ y 2 = 0. Show that we can ¬nd

g1 : R ’ R and g2 : R ’ R such that

for all x ∈ R,

x3 ’ g1 (x)2 = 0

for all y ∈ R,

g2 (y)3 ’ y 2 = 0

but g1 is di¬erentiable everywhere and g2 is not. Explain in simple terms why

this is the case.

One way of looking at our slogan is to consider a walker on a hill whose

height is given by f (x, y) at a point (x, y) on a map. The walker seeks to walk

along a path of constant height. She is clearly going to have problems if she

starts at a strict maximum since a step in any direction takes her downward.

A similar di¬culty occurs at a strict minimum. A di¬erent problem occurs

at a saddle point (see Figure 13.1). It appears from the picture that, if f

is well behaved, there are not one but two paths of constant height passing

through a saddle point and this will create substantial di¬culties3 . However,

these are the only points which present problems.

Another way to look at our slogan is to treat it as a problem in the

calculus (our arguments will, however, continue to be informal). Suppose

that

f (x, g(x)) = f (x0 , y0 ).

Assuming that everything is well behaved, we can di¬erentiate with respect

to x, obtaining

f,1 (x, g(x)) + g (x)f,2 (x, g(x)) = 0

3

Remember Buridan™s ass which placed between two equally attractive bundles of hay

starved to death because it was unable to ¬nd a reason for starting on one bundle rather

than the other.

341

Please send corrections however trivial to twk@dpmms.cam.ac.uk

and so, provided that f,2 (x, y) = 0,

f,1 (x, g(x))

g (x) = ’ .

f,2 (x, g(x))

Thus, provided that f is su¬ciently well behaved and

f,1 (x0 , y0 ) = 0,

our earlier work on di¬erential equations tells us that there exists a local

solution for g.

The two previous paragraphs tend to con¬rm the truth of our slogan and

show that there exist local contour lines in the neighbourhood of any point

(x0 , y0 ) where at least one of f,1 (x0 , y0 ) and f,2 (x0 , y0 ) does not vanish. We

shall not seek to establish the global existence of contour lines4 . Instead we

seek to extend the ideas of our slogan to higher dimensions.

We argue informally, assuming good behaviour as required. Consider a

function f : Rm — Rn ’ Rn . Without loss of generality, we may suppose that

f (0, 0) = 0. An appropriate generalisation of the questions considered above

is to ask about solutions to the equation

f (x, gh (x)) = h (1)

where h is ¬xed. (Note that x ∈ Rm , gh (x) ∈ Rn and h ∈ Rn . Since

everything is local we suppose h is small and we only consider (x, gh (x))

close to (0, 0).) Before proceeding to the next paragraph the reader should

convince herself that the question we have asked is a natural one.

The key step in resolving this problem is to rewrite it. De¬ne ˜, g :

f˜

R — R ’ R — R by

m n m n

˜(x, y) = (x, f (x, y))

f

˜

g(x, y) = (x, gy (x)).

If equation (1) holds, then

˜(˜ (x, h)) = ˜(x, gh (x)) = x, f (gh (x), x) = (x, h),

fg f

and so

˜(˜ (x, h)) = (x, h).

fg (2)

4

Obviously the kind of ideas considered in Section 12.3 will play an important role in

such an investigation. One obvious problem is that, when we look at a contour in one

location, there is no way of telling if it will not pass through a saddle point at another.

342 A COMPANION TO ANALYSIS

Conversely, if equation (2) holds, then equation (1) follows.

To solve equation (2) we need to use the inverse mapping results of the

previous section and, to use those results, we need to know if D˜(x, y) is

f

invertible at a given point (x, y). Now

D˜(x, y)(u, v) = Df (x, y)(u, v) + u,

f

and so D˜(x, y) is invertible if and only if the linear map ± : Rn ’ Rn , given

f

by

±v = Df (x, y)(0, v),

is invertible. (We present two proofs of this last statement in the next two

exercises. Both exercises take some time to state and hardly any time to do.)

Exercise 13.2.2. In this exercise we use column vectors. Thus we write

x

˜x x x

˜

f = ,g = .

x

f

y y gy (x)

y

x

Suppose that Df has matrix C with respect to the standard basis. Explain

y

why C is an n — (n + m) matrix which we can therefore write as

C = (B A),

with A an n — n matrix and B an n — m matrix.

x

Show that D˜f has matrix

y

I0

E= ,

BA

where 0 is the n — m matrix consisting of entirely of zeros and I is the m — m

identity matrix. By considering det E, or otherwise, show that E is invertible

if and only if A is. Show that A is the matrix of the linear map ± : Rn ’ Rn

given by

x 0

±v = Df

y v

Exercise 13.2.3. Let W be the direct sum of two subspaces U and V . If

γ : W ’ V is a linear map, show that the map ± : V ’ V , given by

±v = γv for all v ∈ V , is linear.

Now de¬ne γ : W ’ W by γ (u + v) = γ(u + v) + u for u ∈ U , v ∈ V .

˜ ˜

Show that γ is a well de¬ned linear map, that ker(˜ ) = ker(±) and that

˜ γ

γ (W ) = U + ±(V ). Conclude that γ is invertible if and only if ± is.

˜ ˜

343

Please send corrections however trivial to twk@dpmms.cam.ac.uk

We can now pull the strands together.

Theorem 13.2.4. (Implicit function theorem.) Consider a function f :

Rm —Rn ’ Rn which is di¬erentiable on an open set U . Suppose further that

Df is continuous at every point of U , that (x0 , y0 ) ∈ U and that the linear

map ± : Rn ’ Rn given by

±t = Df (x0 , y0 )(0, t)

is invertible. Then we can ¬nd an open set B1 in Rm , with x ∈ B1 , and an

open set B2 in Rm , with f (x0 , y0 ) ∈ B2 , such that, if z ∈ B2 , there exists a

di¬erentiable map gz : B1 ’ Rm with (x, gz (x)) ∈ U and

f (x, gz (x)) = z

for all x ∈ B1 .

(We give a slight improvement on this result in Theorem 13.2.9 but The-

orem 13.2.4 contains the essential result.)

Proof. De¬ne ˜ : Rm — Rn ’ Rm — Rn by

f

˜(x, y) = (x, f (x, y)).

f

If (x, y) ∈ U then D˜(x, y) exists and

f

D˜(x, y)(s, t) = Df (x, y)(s, t) + s.

f

In particular, since ±, is invertible D˜(x0 , y0 ) is invertible. It follows, by the

f

inverse function theorem (Theorem 13.1.13), that we can ¬nd an open set

B ⊆ U with (x0 , y0 ) ∈ B and an open set V such that

(i) ˜|B : B ’ V is bijective, and

f

(ii) ˜|’1 : V ’ B is di¬erentiable.

fB

Let us de¬ne G : V ’ Rm and g : V ’ Rn by

˜|’1 (v) = (G(v), g(v))

fB

for all v ∈ V . We observe that g is everywhere di¬erentiable on V . By

de¬nition,

(x, z) = ˜|B (˜|’1 (x, z))

f fB

= ˜|B (G(x, z), g(x, z))

f

= (G(x, z), f (G(x, z), g(x, z)))

344 A COMPANION TO ANALYSIS

and so

G(x, z) = x,

and

z = f (x, g(x, z))

for all (x, z) ∈ V .

We know that V is open and (x0 , f (x0 , y0 )) ∈ V , so we can ¬nd an open

set B1 in Rm , with x0 ∈ B1 , and an open set B2 in Rn , with f (x0 , y0 ) ∈ B2 ,

such that B1 — B2 ⊆ V . Setting

gz (x) = g(x, z)

for all x ∈ B1 and z ∈ B2 , we have the required result.

Remark 1: We obtained the implicit function theorem from the inverse func-

tion theorem by introducing new functions ˜ and g. There is, however, no

˜

f

reason why we should not obtain the implicit function directly by following

the same kind of method as we used to prove the inverse function theorem.

Here is the analogue of Lemma 13.1.2

Lemma 13.2.5. Consider a function f : Rm ’ Rn such that f (0, 0) = 0.

Suppose that there exists a δ > 0 and an · with 1 > · > 0 such that

(f (x, t) ’ f (x, s)) ’ (s ’ t) ¤ · (s ’ t)

for all x , s , t ¤ δ [x ∈ Rm , s, t ∈ Rn ]. Then, if h ¤ (1 ’ ·)δ,

there exists one and only one solution of the equation

f (x, u) = h

with u < δ. Further, if we denote this solution by gh (x), we have

gh (x) ’ h ¤ ·(1 ’ ·)’1 h .

Exercise 13.2.6. Prove Lemma 13.2.5. Sketch, giving as much or as little

detail as you wish, the steps from Lemma 13.2.5 to Theorem 13.2.4. You

should convince yourself that the inverse function theorem and the implicit

function theorem are ¬ngers of the same hand.

345

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Remark 2: In the introduction to this section we obtained contours as the

solutions of an ordinary di¬erential equation of the form

f,1 (x, g(x))

g (x) = ’ .

f,2 (x, g(x))

The situation for the general implicit function theorem is genuinely more

complicated. Suppose, for example that we have a function f : R4 ’ R2 and

we wish to ¬nd (u, v) : R2 ’ R2 so that (at least locally)

f (x, y, u(x, y), v(x, y)) = h,

that is

f1 (x, y, u(x, y), v(x, y)) = h1 ,

f2 (x, y, u(x, y), v(x, y)) = h2 .

On di¬erentiating, we obtain

‚u ‚v

f1,1 (x, y, u(x, y), v(x, y)) + f1,3 (x, y, u(x, y), v(x, y)) + f1,4 (x, y, u(x, y), v(x, y)) = 0,

‚x ‚x

‚u ‚v

f1,1 (x, y, u(x, y), v(x, y)) + f1,3 (x, y, u(x, y), v(x, y)) + f1,4 (x, y, u(x, y), v(x, y)) = 0,

‚y ‚y

‚u ‚v

f2,1 (x, y, u(x, y), v(x, y)) + f2,3 (x, y, u(x, y), v(x, y)) + f2,4 (x, y, u(x, y), v(x, y)) = 0,

‚x ‚x

‚u ‚v

f2,1 (x, y, u(x, y), v(x, y)) + f2,3 (x, y, u(x, y), v(x, y)) + f2,4 (x, y, u(x, y), v(x, y)) = 0,

‚y ‚y