<< . .

. 16
( : 70)



. . >>

g —¦ f (h) ≈ (β±)h

when h is small (h ∈ Rm ). In other words g —¦ f is di¬erentiable at 0.
We have been lead to formulate the chain rule.
Lemma 6.2.10. (The chain rule.) Let U be a neighbourhood of x in Rm ,
and V a neighbourhood of y in Rp . Suppose that f : U ’ V is di¬erentiable
at x with derivative ±, that g : V ’ Rq is di¬erentiable at y with derivative
β and that f (x) = y. Then g —¦ f is di¬erentiable at x with derivative β±.
In more condensed notation

D(g —¦ f )(x) = Dg(f (x))Df (x),

or, equivalently,

D(g —¦ f )(x) = (Dg) —¦ f (x)Df (x).

Proof. We know that

f (x + h) = f (x) + ±h + 1 (h) h

and

g(f (x) + k) = g(f (x)) + βk + 2 (k) k

’ 0 as h ’ 0 and ’ 0 as k ’ 0. It follows that
where 1 (h) 2 (k)


g —¦ f (x + h) = g(f (x + h))
= g(f (x) + ±h + 1 (h) h)

so, taking k = ±h + 1 (h) h , we have

g —¦ f (x + h) = g(f (x)) + β(±h + 1 (h) h ) + 2 (±h + 1 (h) h ) ±h + 1 (h) h
= g —¦ f (x) + β±h + ·(h) h

with

·(h) = · 1 (h) + · 2 (h)
132 A COMPANION TO ANALYSIS

where

· 1 (h) h = β 1 (h) h

and

· 2 (h) h = 2 (±h + 1 (h) h ) ±h + 1 (h) h.

All we have to do is to show that · 1 (h) and · 2 (h) , and so ·(h) =
· 1 (h) + · 2 (h) tend to zero as h ’ 0. We observe ¬rst that

¤β
· 1 (h) h 1 (h) h =β 1 (h) h

so · 1 (h) ¤ β ’ 0 as h ’ 0. Next we observe that
1 (h)

· 2 (h) h = 2 (±h + 1 (h) h ) ±h + 1 (h) h
¤ 2 (±h + 1 (h) h ) ( ±h + 1 (h) h )
¤ 2 (±h + 1 (h) h ) ( ± + 1 (h) ) h ,

so that

· 2 (h) ¤ )’0
2 (±h + 1 (h) h)(± + 1 (h)

as h ’ 0 and we are done.
Students sometimes say that the proof of the chain rule is di¬cult but
they really mean that it is tedious. It is simply a matter of showing that
the error terms · 1 (h) h and · 2 (h) h which ought to be small, actually
are. Students also forget the arti¬ciality of the standard proofs of the one
dimensional chain rule (see the discussion of Lemma 5.6.2 ” any argument
which Hardy got wrong cannot be natural). The multidimensional argument
forces us to address the real nature of the chain rule.
The next result is very simple but I would like to give two di¬erent proofs.
Lemma 6.2.11. Let U be a neighbourhood of x in Rn . Suppose that f , g :
U ’ Rm are di¬erentiable at x. Then f + g is di¬erentiable at x with
D(f + g)(x) = Df (x) + Dg(x).
Direct proof. By de¬nition

f (x + h) = f (x) + Df (x)h + 1 (h) h

and

g(x + h) = g(x) + Dg(x)h + 2 (h) h
133
Please send corrections however trivial to twk@dpmms.cam.ac.uk

’ 0 and ’ 0 as h ’ 0. Thus
where 1 (h) 2 (h)

(f + g)(x + h) = f (x + h) + g(x + h)
= f (x) + Df (x)h + 1 (h) h + g(x) + Dg(x)h + 2 (h) h
= (f + g)(x) + (Df (x) + Dg(x))h + 3 (h) h
with

3 (h) = 1 (h) + 2 (h).

Since
¤ ’ 0 + 0 = 0,
3 (h) 1 (h) + 2 (h)

as h ’ 0, we are done.
Our second proof depends on a series of observations.
Lemma 6.2.12. A linear map ± : Rn ’ Rm is everywhere di¬erentiable
with derivative ±.
Proof. Observe that
±(x + h) = ±x + ±h + (h) h ,
where (h) = 0, and apply the de¬nition.
As the reader can see, the result and proof are trivial, but they take some
getting used to. In one dimension the result says that the map given by
x ’ ax has derivative x ’ ax (or that the tangent to the line y = ax is the
line y = ax itself, or that the derivative of the linear map with 1 — 1 matrix
(a) is the linear map with matrix (a)).
Exercise 6.2.13. Show that the constant map fc : Rn ’ Rm , given by
fc (x) = c for all x, is everywhere di¬erentiable with derivative the zero linear
map.
Lemma 6.2.14. Let U be a neighbourhood of x in Rn and V a neighbourhood
of y in Rm . Suppose that f : U ’ Rp is di¬erentiable at x and g : V ’ Rq
is di¬erentiable at y. Then U — V is a neighbourhood of (x, y) in Rn+m and
the function (f , g) : U — V ’ Rp+q given by
(f , g)(u, v) = (f (u), g(v))
is di¬erentiable at (x, y) with derivative (Df (x), Dg(x)) where we write
(Df (x), Dg(x))(h, k) = (Df (x)h, Dg(x)k).
134 A COMPANION TO ANALYSIS

Proof. We leave some details (such as verifying that U —V is a neighbourhood
of (x, y)) to the reader. The key to the proof is the remark that (h, k) ≥
h , k . Observe that, if we write

f (x + h) = f (x) + Df (x)h + 1 (h) h

and

g(y + k) = g(y) + Dg(y)k + 2 (k) k,

we have

(f , g)((x, y) + (h, k)) = (f , g)(x, y) + (Df (x), Dg(x))(h, k) + (h, k) (h, k)

where

(h, k) (h, k) = 1 (h) h+ 2 (k) k.

Using the last equation, we obtain

(h, k) (h, k) = ( (h, k) (h, k) ) = ( 1 (h) h + 2 (k) k )
¤ ( 1 (h) h ) + ( 2 (k) k ) ¤ 1 (h) (h, k) + 2 (k) (h, k) .

Thus

(h, k) ¤ ’0+0=0
1 (h) + 2 (k)


as (h, k) ’ 0.

Exercise 6.2.15. If h ∈ Rn and k ∈ Rm , show that
2 2 2
(h, k) =h +k

and

h + k ≥ (h, k) ≥ h , k.

Exercise 6.2.16. Consider the situation described in Lemma 6.2.14. Write
down the Jacobian matrix of partial derivatives for (f , g) in terms of the
Jacobian matrices for f and g.

We can now give a second proof of Lemma 6.2.11 using the chain rule.
135
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Second proof of Lemma 6.2.11. Let ± : Rn ’ R2n be the map given by

±(x) = (x, x)

and β : R2m ’ Rm be the map given by

β(x, y) = x + y.

Then, using the notation of Lemma 6.2.14,

f + g = β —¦ (f , g) —¦ ±.

But ± and β are linear, so using the chain rule (Lemma 6.2.10), we see that
f + g is di¬erentiable at x and

D(f + g)(x) = β —¦ D(f , g)(x, x) —¦ ± = Df (x) + Dg(x).



If we only used this idea to prove Lemma 6.2.11 it would hardly be worth
it but it is frequently easiest to show that a complicated function is di¬eren-
tiable by expressing it as the composition of simpler di¬erentiable functions.
(How else would one prove that x ’ sin(exp(1 + x2 )) is di¬erentiable?)

Exercise 6.2.17. (i) Show that the function J : Rn — Rn ’ R given by the
scalar product

J(u, v) = u · v

is everywhere di¬erentiable with

DJ(x, y)(h, k) = x · k + y · h.

(ii) Let U be a neighbourhood of x in Rn . Suppose that f , g : U ’ Rm
are di¬erentiable at x. Show, using the chain rule, that f · g is di¬erentiable
at x with

D(f · g)(x)h = f (x) · (D(g)(x)h) + (D(f )(x)h) · g(x).

(iii) Let U be a neighbourhood of x in Rn . Suppose that f : U ’ Rm
and » : U ’ R are di¬erentiable at x. State and prove an appropriate result
about the function »f given by

(»f )(u) = »(u)f (u).
136 A COMPANION TO ANALYSIS

(iv) If you have met the vector product1 u § v of two vectors u, v ∈ R3 ,
state and prove an appropriate theorem about the vector product of di¬eren-
tiable functions.
(v) Let U be a neighbourhood of x in Rn . Suppose that f : U ’ R is
non-zero on U and di¬erentiable at x. Show that 1/f is di¬erentiable at x
and ¬nd D(1/f )x.


6.3 The mean value inequality in higher di-
mensions
So far our study of di¬erentiation in higher dimensions has remained on
the level of mere algebra. (The de¬nition of the operator norm used the
supremum and so lay deeper but we could have avoided this at the cost of
using a less natural norm.) The next result is a true theorem of analysis.

Theorem 6.3.1. (The mean value inequality.) Suppose that U is an
open set in Rm and that f : U ’ Rp is di¬erentiable. Consider the straight
line segment

L = {(1 ’ t)a + tb : 0 ¤ t ¤ 1}

joining a and b. If L ⊆ U (i.e. L lies entirely within U ) and Df (x) ¤ K
for all x ∈ L, then

f (a) ’ f (b) ¤ K a ’ b .

Proof. Before starting the proof, it is helpful to note that, since U is open,
we can ¬nd a · > 0 such that the extended straight line segment

{(1 ’ t)a + tb : ’· ¤ t ¤ 1 + ·} ⊆ U.

We shall prove our many dimensional mean value inequality from the
one dimensional version (Theorem 1.7.1, or if the reader prefers, the slightly
sharper Theorem 4.4.1). To this end, observe that, if f (b) ’ f (a) = 0, there
is nothing to prove. We may thus assume that f (b) ’ f (a) = 0 and consider

f (b) ’ f (a)
u= ,
f (b) ’ f (a)
1
Question What do you get if you cross a mountaineer with a mosquito? Answer You
can™t. One is a scaler and the other is a vector.
137
Please send corrections however trivial to twk@dpmms.cam.ac.uk

the unit vector in the direction f (b)’f (a). If we now de¬ne g : (’·, 1+·) ’
R by

g(t) = u · f ((1 ’ t)a + tb) ’ f (a) ,

we see, by using the chain rule or direct calculation, that g is continuous and
di¬erentiable on (’·, 1 + ·) with

g (t) = u · (Df ((1 ’ t)a + tb)(b ’ a)).

Using the Cauchy-Schwarz inequality (Lemma 4.1.2) and the de¬nition of
the operator norm (De¬nition 6.2.4), we have

|g (t)| ¤ u Df ((1 ’ t)a + tb)(b ’ a)
= Df ((1 ’ t)a + tb)(b ’ a)
¤ Df ((1 ’ t)a + tb) b ’ a
¤K a’b .

for all t ∈ (0, 1). Thus, by the one dimensional mean value inequality,

f (a) ’ f (b) = |g(1) ’ g(0)| ¤ K a ’ b

as required.
Exercise 6.3.2. (i) Prove the statement of the ¬rst sentence in the proof
just given.
(ii) If g is the function de¬ned in the proof just given, show, giving all
the details, that g is continuous and di¬erentiable on (’·, 1 + ·) with

g (t) = u · Df ((1 ’ t)a + tb)(b ’ a) .

You should give two versions of the proof, the ¬rst using the chain rule
(Lemma 6.2.10) and the second using direct calculation.
If we have already gone to the trouble of proving the one-dimensional
mean value inequality it seems sensible to make use of it in proving the mul-
tidimensional version. However, we could have proved the multidimensional
theorem directly without making a one-dimensional detour.
Exercise 6.3.3. (i) Reread the proof of Theorem 1.7.1.
(ii) We now start the direct proof of Theorem 6.3.1. As before observe
that we can ¬nd a · > 0 such that

{(1 ’ t)a + tb : ’· ¤ t ¤ 1 + ·} ⊆ U,
138 A COMPANION TO ANALYSIS

but now consider F : (’·, 1 + ·) ’ Rp by

F(t) = f ((1 ’ t)a + tb) ’ f (a).

Explain why the theorem will follow if we can show that, given any > 0, we
have

F(1) ’ F(0) ¤ K a ’ b + .

(ii) Suppose, if possible, that there exists an > 0 such that

F(1) ’ F(0) ≥ K a ’ b + .

Show by a lion hunting argument that there exist a c ∈ [0, 1] and un , vn ∈
[0, 1] with un < vn such that un , vn ’ c and

F(vn ) ’ F(un ) ≥ (K a ’ b + )(vn ’ un ).

(iii) Show from the de¬nition of di¬erentiability that there exists a δ > 0
such that

F(t) ’ F(c) < (K a ’ b + /2)|t ’ c|

whenever |t ’ c| < δ and t ∈ [0, 1].
(iv) Prove Theorem 6.3.1 by reductio ad absurdum.

One of the principal uses we made of the one dimensional mean value
theorem was to show that a function on an open interval with zero derivative
was necessarily constant. The reader should do both parts of the following
easy exercise and re¬‚ect on them.

Exercise 6.3.4. (i) Let U be an open set in Rm such that given any a, b ∈ U
we can ¬nd a ¬nite sequence of points a = a0 , a1 , . . . , ak’1 , ak = b such
that each line segment

{(1 ’ t)aj’1 + taj : 0 ¤ t ¤ 1} ⊆ U

[1 ¤ j ¤ k]. Show that, if f : U ’ Rp is everywhere di¬erentiable on U with
Df (x) = 0, it follows that f is constant.
(ii) We work in R2 . Let U1 be the open disc of radius 1 centre (’2, 0)
and U2 be the open disc of radius 1 centre (2, 0). Set U = U1 ∪ U2 . De¬ne
f : U ’ R by f (x) = ’1 for x ∈ U1 , f (x) = 1 for x ∈ U2 . Show that f is
everywhere di¬erentiable on U with D(f )(x) = 0 but f is not constant.
139
Please send corrections however trivial to twk@dpmms.cam.ac.uk

The reader may ask if we can obtain an improvement to our mean value
inequality by some sort of equality along the lines of Theorem 4.4.1. The
answer is a clear no.

Exercise 6.3.5. Let f : R ’ R2 be given by f (t) = (cos t, sin t)T . Compute
the Jacobian matrix of partial derivatives for f and show that f (0) = f (2π)
but Df (t) = 0 for all t.

(Although Exercise K.102 is not a counter example it points out another
problem which occurs when we work in many dimensions.)
It is fairly obvious that we cannot replace the line segment L in Theo-
rem 6.3.1 by other curves without changing the conclusion.

Exercise 6.3.6. Let

U = {x ∈ R2 : x > 1} \ {(x, 0)T : x ¤ 0}

If we take θ(x) to be the unique solution of
x y
, ’π < θ(x) < π
cos(θ(x)) = , sin(θ(x)) = 2
(x2 + y 2 )1/2 (x + y 2 )1/2

for x = (x, y)T ∈ U , show that θ : U ’ R is everywhere di¬erentiable
with Dθ(x) < 1. (The amount of work involved in proving this depends
quite strongly on how clever you are in exploiting radial symmetry.) Show,
however, that if a = (’1, 10’1 )T , b = (’1, ’10’1 )T , then

|θ(a) ’ θ(b)| > a ’ b .

It is clear (though we shall not prove it, and, indeed, cannot yet state it
without using concepts which we have not formally de¬ned) that the correct
generalisation when L is not a straight line will run as follows. ˜If L is a well
behaved path lying entirely within U and Df (x) ¤ K for all x ∈ L then
f (a) ’ f (b) ¤ K — length L™.
Chapter 7

Local Taylor theorems

7.1 Some one dimensional Taylor theorems
By de¬nition, a function f : R ’ R which is continuous at 0 looks like a
constant function near 0, in the sense that

f (t) = f (0) + (t)

where (t) ’ 0 as t ’ 0. By de¬nition, again, a function f : R ’ R which
is di¬erentiable at 0 looks like a linear function near 0, in the sense that

f (t) = f (0) + f (0)t + (t)|t|

where (t) ’ 0 as t ’ 0. The next exercise establishes the non-trivial
theorem that a function f : R ’ R, which is n times di¬erentiable in a
neighbourhood of 0 and has f (n) continuous at 0, looks like a polynomial of
degree n near 0, in the sense that

f (n) (0) n

<< . .

. 16
( : 70)



. . >>