ńņš. 16 |

when h is small (h ā Rm ). In other words g ā—¦ f is diļ¬erentiable at 0.

We have been lead to formulate the chain rule.

Lemma 6.2.10. (The chain rule.) Let U be a neighbourhood of x in Rm ,

and V a neighbourhood of y in Rp . Suppose that f : U ā’ V is diļ¬erentiable

at x with derivative Ī±, that g : V ā’ Rq is diļ¬erentiable at y with derivative

Ī² and that f (x) = y. Then g ā—¦ f is diļ¬erentiable at x with derivative Ī²Ī±.

In more condensed notation

D(g ā—¦ f )(x) = Dg(f (x))Df (x),

or, equivalently,

D(g ā—¦ f )(x) = (Dg) ā—¦ f (x)Df (x).

Proof. We know that

f (x + h) = f (x) + Ī±h + 1 (h) h

and

g(f (x) + k) = g(f (x)) + Ī²k + 2 (k) k

ā’ 0 as h ā’ 0 and ā’ 0 as k ā’ 0. It follows that

where 1 (h) 2 (k)

g ā—¦ f (x + h) = g(f (x + h))

= g(f (x) + Ī±h + 1 (h) h)

so, taking k = Ī±h + 1 (h) h , we have

g ā—¦ f (x + h) = g(f (x)) + Ī²(Ī±h + 1 (h) h ) + 2 (Ī±h + 1 (h) h ) Ī±h + 1 (h) h

= g ā—¦ f (x) + Ī²Ī±h + Ī·(h) h

with

Ī·(h) = Ī· 1 (h) + Ī· 2 (h)

132 A COMPANION TO ANALYSIS

where

Ī· 1 (h) h = Ī² 1 (h) h

and

Ī· 2 (h) h = 2 (Ī±h + 1 (h) h ) Ī±h + 1 (h) h.

All we have to do is to show that Ī· 1 (h) and Ī· 2 (h) , and so Ī·(h) =

Ī· 1 (h) + Ī· 2 (h) tend to zero as h ā’ 0. We observe ļ¬rst that

ā¤Ī²

Ī· 1 (h) h 1 (h) h =Ī² 1 (h) h

so Ī· 1 (h) ā¤ Ī² ā’ 0 as h ā’ 0. Next we observe that

1 (h)

Ī· 2 (h) h = 2 (Ī±h + 1 (h) h ) Ī±h + 1 (h) h

ā¤ 2 (Ī±h + 1 (h) h ) ( Ī±h + 1 (h) h )

ā¤ 2 (Ī±h + 1 (h) h ) ( Ī± + 1 (h) ) h ,

so that

Ī· 2 (h) ā¤ )ā’0

2 (Ī±h + 1 (h) h)(Ī± + 1 (h)

as h ā’ 0 and we are done.

Students sometimes say that the proof of the chain rule is diļ¬cult but

they really mean that it is tedious. It is simply a matter of showing that

the error terms Ī· 1 (h) h and Ī· 2 (h) h which ought to be small, actually

are. Students also forget the artiļ¬ciality of the standard proofs of the one

dimensional chain rule (see the discussion of Lemma 5.6.2 ā” any argument

which Hardy got wrong cannot be natural). The multidimensional argument

forces us to address the real nature of the chain rule.

The next result is very simple but I would like to give two diļ¬erent proofs.

Lemma 6.2.11. Let U be a neighbourhood of x in Rn . Suppose that f , g :

U ā’ Rm are diļ¬erentiable at x. Then f + g is diļ¬erentiable at x with

D(f + g)(x) = Df (x) + Dg(x).

Direct proof. By deļ¬nition

f (x + h) = f (x) + Df (x)h + 1 (h) h

and

g(x + h) = g(x) + Dg(x)h + 2 (h) h

133

Please send corrections however trivial to twk@dpmms.cam.ac.uk

ā’ 0 and ā’ 0 as h ā’ 0. Thus

where 1 (h) 2 (h)

(f + g)(x + h) = f (x + h) + g(x + h)

= f (x) + Df (x)h + 1 (h) h + g(x) + Dg(x)h + 2 (h) h

= (f + g)(x) + (Df (x) + Dg(x))h + 3 (h) h

with

3 (h) = 1 (h) + 2 (h).

Since

ā¤ ā’ 0 + 0 = 0,

3 (h) 1 (h) + 2 (h)

as h ā’ 0, we are done.

Our second proof depends on a series of observations.

Lemma 6.2.12. A linear map Ī± : Rn ā’ Rm is everywhere diļ¬erentiable

with derivative Ī±.

Proof. Observe that

Ī±(x + h) = Ī±x + Ī±h + (h) h ,

where (h) = 0, and apply the deļ¬nition.

As the reader can see, the result and proof are trivial, but they take some

getting used to. In one dimension the result says that the map given by

x ā’ ax has derivative x ā’ ax (or that the tangent to the line y = ax is the

line y = ax itself, or that the derivative of the linear map with 1 Ć— 1 matrix

(a) is the linear map with matrix (a)).

Exercise 6.2.13. Show that the constant map fc : Rn ā’ Rm , given by

fc (x) = c for all x, is everywhere diļ¬erentiable with derivative the zero linear

map.

Lemma 6.2.14. Let U be a neighbourhood of x in Rn and V a neighbourhood

of y in Rm . Suppose that f : U ā’ Rp is diļ¬erentiable at x and g : V ā’ Rq

is diļ¬erentiable at y. Then U Ć— V is a neighbourhood of (x, y) in Rn+m and

the function (f , g) : U Ć— V ā’ Rp+q given by

(f , g)(u, v) = (f (u), g(v))

is diļ¬erentiable at (x, y) with derivative (Df (x), Dg(x)) where we write

(Df (x), Dg(x))(h, k) = (Df (x)h, Dg(x)k).

134 A COMPANION TO ANALYSIS

Proof. We leave some details (such as verifying that U Ć—V is a neighbourhood

of (x, y)) to the reader. The key to the proof is the remark that (h, k) ā„

h , k . Observe that, if we write

f (x + h) = f (x) + Df (x)h + 1 (h) h

and

g(y + k) = g(y) + Dg(y)k + 2 (k) k,

we have

(f , g)((x, y) + (h, k)) = (f , g)(x, y) + (Df (x), Dg(x))(h, k) + (h, k) (h, k)

where

(h, k) (h, k) = 1 (h) h+ 2 (k) k.

Using the last equation, we obtain

(h, k) (h, k) = ( (h, k) (h, k) ) = ( 1 (h) h + 2 (k) k )

ā¤ ( 1 (h) h ) + ( 2 (k) k ) ā¤ 1 (h) (h, k) + 2 (k) (h, k) .

Thus

(h, k) ā¤ ā’0+0=0

1 (h) + 2 (k)

as (h, k) ā’ 0.

Exercise 6.2.15. If h ā Rn and k ā Rm , show that

2 2 2

(h, k) =h +k

and

h + k ā„ (h, k) ā„ h , k.

Exercise 6.2.16. Consider the situation described in Lemma 6.2.14. Write

down the Jacobian matrix of partial derivatives for (f , g) in terms of the

Jacobian matrices for f and g.

We can now give a second proof of Lemma 6.2.11 using the chain rule.

135

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Second proof of Lemma 6.2.11. Let Ī± : Rn ā’ R2n be the map given by

Ī±(x) = (x, x)

and Ī² : R2m ā’ Rm be the map given by

Ī²(x, y) = x + y.

Then, using the notation of Lemma 6.2.14,

f + g = Ī² ā—¦ (f , g) ā—¦ Ī±.

But Ī± and Ī² are linear, so using the chain rule (Lemma 6.2.10), we see that

f + g is diļ¬erentiable at x and

D(f + g)(x) = Ī² ā—¦ D(f , g)(x, x) ā—¦ Ī± = Df (x) + Dg(x).

If we only used this idea to prove Lemma 6.2.11 it would hardly be worth

it but it is frequently easiest to show that a complicated function is diļ¬eren-

tiable by expressing it as the composition of simpler diļ¬erentiable functions.

(How else would one prove that x ā’ sin(exp(1 + x2 )) is diļ¬erentiable?)

Exercise 6.2.17. (i) Show that the function J : Rn Ć— Rn ā’ R given by the

scalar product

J(u, v) = u Ā· v

is everywhere diļ¬erentiable with

DJ(x, y)(h, k) = x Ā· k + y Ā· h.

(ii) Let U be a neighbourhood of x in Rn . Suppose that f , g : U ā’ Rm

are diļ¬erentiable at x. Show, using the chain rule, that f Ā· g is diļ¬erentiable

at x with

D(f Ā· g)(x)h = f (x) Ā· (D(g)(x)h) + (D(f )(x)h) Ā· g(x).

(iii) Let U be a neighbourhood of x in Rn . Suppose that f : U ā’ Rm

and Ī» : U ā’ R are diļ¬erentiable at x. State and prove an appropriate result

about the function Ī»f given by

(Ī»f )(u) = Ī»(u)f (u).

136 A COMPANION TO ANALYSIS

(iv) If you have met the vector product1 u ā§ v of two vectors u, v ā R3 ,

state and prove an appropriate theorem about the vector product of diļ¬eren-

tiable functions.

(v) Let U be a neighbourhood of x in Rn . Suppose that f : U ā’ R is

non-zero on U and diļ¬erentiable at x. Show that 1/f is diļ¬erentiable at x

and ļ¬nd D(1/f )x.

6.3 The mean value inequality in higher di-

mensions

So far our study of diļ¬erentiation in higher dimensions has remained on

the level of mere algebra. (The deļ¬nition of the operator norm used the

supremum and so lay deeper but we could have avoided this at the cost of

using a less natural norm.) The next result is a true theorem of analysis.

Theorem 6.3.1. (The mean value inequality.) Suppose that U is an

open set in Rm and that f : U ā’ Rp is diļ¬erentiable. Consider the straight

line segment

L = {(1 ā’ t)a + tb : 0 ā¤ t ā¤ 1}

joining a and b. If L ā U (i.e. L lies entirely within U ) and Df (x) ā¤ K

for all x ā L, then

f (a) ā’ f (b) ā¤ K a ā’ b .

Proof. Before starting the proof, it is helpful to note that, since U is open,

we can ļ¬nd a Ī· > 0 such that the extended straight line segment

{(1 ā’ t)a + tb : ā’Ī· ā¤ t ā¤ 1 + Ī·} ā U.

We shall prove our many dimensional mean value inequality from the

one dimensional version (Theorem 1.7.1, or if the reader prefers, the slightly

sharper Theorem 4.4.1). To this end, observe that, if f (b) ā’ f (a) = 0, there

is nothing to prove. We may thus assume that f (b) ā’ f (a) = 0 and consider

f (b) ā’ f (a)

u= ,

f (b) ā’ f (a)

1

Question What do you get if you cross a mountaineer with a mosquito? Answer You

canā™t. One is a scaler and the other is a vector.

137

Please send corrections however trivial to twk@dpmms.cam.ac.uk

the unit vector in the direction f (b)ā’f (a). If we now deļ¬ne g : (ā’Ī·, 1+Ī·) ā’

R by

g(t) = u Ā· f ((1 ā’ t)a + tb) ā’ f (a) ,

we see, by using the chain rule or direct calculation, that g is continuous and

diļ¬erentiable on (ā’Ī·, 1 + Ī·) with

g (t) = u Ā· (Df ((1 ā’ t)a + tb)(b ā’ a)).

Using the Cauchy-Schwarz inequality (Lemma 4.1.2) and the deļ¬nition of

the operator norm (Deļ¬nition 6.2.4), we have

|g (t)| ā¤ u Df ((1 ā’ t)a + tb)(b ā’ a)

= Df ((1 ā’ t)a + tb)(b ā’ a)

ā¤ Df ((1 ā’ t)a + tb) b ā’ a

ā¤K aā’b .

for all t ā (0, 1). Thus, by the one dimensional mean value inequality,

f (a) ā’ f (b) = |g(1) ā’ g(0)| ā¤ K a ā’ b

as required.

Exercise 6.3.2. (i) Prove the statement of the ļ¬rst sentence in the proof

just given.

(ii) If g is the function deļ¬ned in the proof just given, show, giving all

the details, that g is continuous and diļ¬erentiable on (ā’Ī·, 1 + Ī·) with

g (t) = u Ā· Df ((1 ā’ t)a + tb)(b ā’ a) .

You should give two versions of the proof, the ļ¬rst using the chain rule

(Lemma 6.2.10) and the second using direct calculation.

If we have already gone to the trouble of proving the one-dimensional

mean value inequality it seems sensible to make use of it in proving the mul-

tidimensional version. However, we could have proved the multidimensional

theorem directly without making a one-dimensional detour.

Exercise 6.3.3. (i) Reread the proof of Theorem 1.7.1.

(ii) We now start the direct proof of Theorem 6.3.1. As before observe

that we can ļ¬nd a Ī· > 0 such that

{(1 ā’ t)a + tb : ā’Ī· ā¤ t ā¤ 1 + Ī·} ā U,

138 A COMPANION TO ANALYSIS

but now consider F : (ā’Ī·, 1 + Ī·) ā’ Rp by

F(t) = f ((1 ā’ t)a + tb) ā’ f (a).

Explain why the theorem will follow if we can show that, given any > 0, we

have

F(1) ā’ F(0) ā¤ K a ā’ b + .

(ii) Suppose, if possible, that there exists an > 0 such that

F(1) ā’ F(0) ā„ K a ā’ b + .

Show by a lion hunting argument that there exist a c ā [0, 1] and un , vn ā

[0, 1] with un < vn such that un , vn ā’ c and

F(vn ) ā’ F(un ) ā„ (K a ā’ b + )(vn ā’ un ).

(iii) Show from the deļ¬nition of diļ¬erentiability that there exists a Ī“ > 0

such that

F(t) ā’ F(c) < (K a ā’ b + /2)|t ā’ c|

whenever |t ā’ c| < Ī“ and t ā [0, 1].

(iv) Prove Theorem 6.3.1 by reductio ad absurdum.

One of the principal uses we made of the one dimensional mean value

theorem was to show that a function on an open interval with zero derivative

was necessarily constant. The reader should do both parts of the following

easy exercise and reļ¬‚ect on them.

Exercise 6.3.4. (i) Let U be an open set in Rm such that given any a, b ā U

we can ļ¬nd a ļ¬nite sequence of points a = a0 , a1 , . . . , akā’1 , ak = b such

that each line segment

{(1 ā’ t)ajā’1 + taj : 0 ā¤ t ā¤ 1} ā U

[1 ā¤ j ā¤ k]. Show that, if f : U ā’ Rp is everywhere diļ¬erentiable on U with

Df (x) = 0, it follows that f is constant.

(ii) We work in R2 . Let U1 be the open disc of radius 1 centre (ā’2, 0)

and U2 be the open disc of radius 1 centre (2, 0). Set U = U1 āŖ U2 . Deļ¬ne

f : U ā’ R by f (x) = ā’1 for x ā U1 , f (x) = 1 for x ā U2 . Show that f is

everywhere diļ¬erentiable on U with D(f )(x) = 0 but f is not constant.

139

Please send corrections however trivial to twk@dpmms.cam.ac.uk

The reader may ask if we can obtain an improvement to our mean value

inequality by some sort of equality along the lines of Theorem 4.4.1. The

answer is a clear no.

Exercise 6.3.5. Let f : R ā’ R2 be given by f (t) = (cos t, sin t)T . Compute

the Jacobian matrix of partial derivatives for f and show that f (0) = f (2Ļ)

but Df (t) = 0 for all t.

(Although Exercise K.102 is not a counter example it points out another

problem which occurs when we work in many dimensions.)

It is fairly obvious that we cannot replace the line segment L in Theo-

rem 6.3.1 by other curves without changing the conclusion.

Exercise 6.3.6. Let

U = {x ā R2 : x > 1} \ {(x, 0)T : x ā¤ 0}

If we take Īø(x) to be the unique solution of

x y

, ā’Ļ < Īø(x) < Ļ

cos(Īø(x)) = , sin(Īø(x)) = 2

(x2 + y 2 )1/2 (x + y 2 )1/2

for x = (x, y)T ā U , show that Īø : U ā’ R is everywhere diļ¬erentiable

with DĪø(x) < 1. (The amount of work involved in proving this depends

quite strongly on how clever you are in exploiting radial symmetry.) Show,

however, that if a = (ā’1, 10ā’1 )T , b = (ā’1, ā’10ā’1 )T , then

|Īø(a) ā’ Īø(b)| > a ā’ b .

It is clear (though we shall not prove it, and, indeed, cannot yet state it

without using concepts which we have not formally deļ¬ned) that the correct

generalisation when L is not a straight line will run as follows. ā˜If L is a well

behaved path lying entirely within U and Df (x) ā¤ K for all x ā L then

f (a) ā’ f (b) ā¤ K Ć— length Lā™.

Chapter 7

Local Taylor theorems

7.1 Some one dimensional Taylor theorems

By deļ¬nition, a function f : R ā’ R which is continuous at 0 looks like a

constant function near 0, in the sense that

f (t) = f (0) + (t)

where (t) ā’ 0 as t ā’ 0. By deļ¬nition, again, a function f : R ā’ R which

is diļ¬erentiable at 0 looks like a linear function near 0, in the sense that

f (t) = f (0) + f (0)t + (t)|t|

where (t) ā’ 0 as t ā’ 0. The next exercise establishes the non-trivial

theorem that a function f : R ā’ R, which is n times diļ¬erentiable in a

neighbourhood of 0 and has f (n) continuous at 0, looks like a polynomial of

degree n near 0, in the sense that

f (n) (0) n

ńņš. 16 |