<< . .

. 18
( : 70)



. . >>

It is possible to produce plausible arguments for the symmetry of second
partial derivatives. Here are a couple.
(1) If f is a multinomial, i.e. f (x, y) = P Q pq
q=0 ap,q x y , then f,12 =
p=0
f,21 . But smooth functions are very close to being polynomial, so we would
expect the result to be true in general.
(2) Although we cannot interchange limits in general, it is plausible, that
if f is well behaved, then

f,12 (x, y) = lim lim h’1 k ’1 (f (x + h, y + k) ’ f (x + h, y) ’ f (x, y + k) + f (x, y))
h’0 k’0
= lim lim h’1 k ’1 (f (x + h, y + k) ’ f (x + h, y) ’ f (x, y + k) + f (x, y))
k’0 h’0
= f,21 (x, y).

However, these are merely plausible arguments. They do not make clear the
rˆle of the continuity of the second derivative (in Example 7.3.18 we shall see
o
that the result may fail for discontinuous second partial derivatives). More
fundamentally, they are algebraic arguments and, as the use of the mean value
theorem indicates, the result is one of analysis. The same kind of argument
which shows that the local Taylor theorem fails over Q (see Example 7.1.8)
shows that it fails over Q2 and that the symmetry of partial derivatives fails
with it (see [33]).
If we use the D notation, Theorem 7.2.6 states that (under appropriate
conditions)

D1 D2 f = D2 D1 f.

If we write Dij = Di Dj , as is often done, we get

D12 f = D21 f.

What happens if a function has higher partial derivatives? It is not hard
to guess and prove the appropriate theorem.
151
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Exercise 7.2.7. Suppose δ > 0, x ∈ Rm B(x, δ) ⊆ E ⊆ Rm and that
f : E ’ R. Show that, if all the partial derivatives f,j , f,jk , f,ijk , . . . up to
the nth order exist in B(x, δ) and are continuous at x, then, writing
m m m m m m
1 1
f (x + h) = f (x) + f,j (x)hj + f,jk (x)hj hk + f,jkl (x)hj hk hl
2! 3!
j=1 j=1 j=1 k=1 l=1
k=1

+ · · · + sum up to nth powers + (h) h n ,

we have (h) ’ 0 as h ’ 0.
Notice that you do not have to prove results like

f,jkl (x) = f,ljk (x) = f,klj (x) = f,lkj (x) = f,jlk (x) = f,kjl (x)

since they follow directly from Theorem 7.2.6.
Applying Exercise 7.2.7 to the components fi of a function f , we obtain
our full many dimensional Taylor theorem.
Theorem 7.2.8 (The local Taylor™s theorem). Suppose δ > 0, x ∈ Rm ,
B(x, δ) ⊆ E ⊆ Rm and that f : E ’ Rp . If all the partial derivatives fi,j ,
fi,jk , fi,jkl , . . . exist in B(x, δ) and are continuous at x, then, writing
m m m
1
fi (x + h) = fi (x) + fi,j (x)hj + fi,jk (x)hj hk
2!
j=1 j=1 k=1
m m m
1
+ fi,jkl (x)hj hk hl
3! j=1 k=1 l=1

+ · · · + sum up to nth powers + i (h) h n ,

(h) ’ 0 as h ’ 0.
we have
The reader will remark that Theorem 7.2.8 bristles with subscripts, con-
trary to our announced intention of seeking a geometric, coordinate free view.
However, it is very easy to restate the main formula of Theorem 7.2.8 in a
coordinate free way as

f (x + h) = f (x) + ±1 (h) + ±2 (h, h) + · · · + ±n (h, h, . . . , h) + (h) h n ,

where ±k : Rm — Rm · · · — Rm ’ Rp is linear in each variable (i.e. a k-
linear function) and symmetric (i.e. interchanging any two variables leaves
the value of ±k unchanged).
Anyone who feels that the higher derivatives are best studied using co-
ordinates should re¬‚ect that, if f : R3 ’ R3 is well behaved, then the
152 A COMPANION TO ANALYSIS

˜third derivative behaviour™ of f at a single point is apparently given by
the 3 — 3 — 3 — 3 = 81 numbers fi,jkl (x). By symmetry (see Theorem 7.2.6)
only 30 of the numbers are distinct but these 30 numbers are independent
(consider polynomials in three variables for which the total degree of each
term is 3). How can we understand the information carried by an array of
30 real numbers?

Exercise 7.2.9. (i) Verify the statements in the last paragraph. How large
an array is required to give the ˜third derivative behaviour™ of a well behaved
function f : R4 ’ R4 at a point? How large an array is required to give the
˜fourth derivative behaviour™ of a well behaved function f : R3 ’ R3 at a
point?
(ii) (Ignore this if the notation is not familiar.) Consider a well behaved
function f : R3 ’ R3 . How large an array is required to give curl f = — f
and div f = · f ? How large an array is required to give Df ?
In many circumstances curl f and div f give the physically interesting part
of Df but physicists also use
3 3 3
(a · )f = aj f1,j , aj f2,j , aj f3,j .
j=1 j=1 j=1


How large an array is required to give (a · )f for all a ∈ R3 ?
In subjects like elasticity the description of nature requires the full Jaco-
bian matrix (fi,j ) and the treatment of di¬erentiation used is closer to that
of the pure mathematician.

Most readers will be happy to ¬nish this section here2 . However, some of
them3 will observe that in our coordinate free statement of the local Taylor™s
theorem the ˜second derivative behaviour™ is given by a bilinear map ±2 :
Rm — Rm ’ Rp and we de¬ned derivatives in terms of linear maps.
Let us be more precise. We suppose f is a well behaved function on an
open set U ⊆ Rp taking values in Rm . If we write L(E, F ) for the space
of linear maps from a ¬nite dimensional vector space E to a vector space F
then, for each ¬xed x ∈ U , we have Df (x) ∈ L(Rm , Rp ). Thus, allowing x to
vary freely, we see that we have a function

Df : U ’ L(Rm , Rp ).
2
The rest of this section is marked with a ™.
3
Boas notes that ˜There is a test for identifying some of the future professional math-
ematicians at an early age. These are students who instantly comprehend a sentence
beginning “Let X be an ordered quintuple (a, T, π, σ, B) where . . . ”. They are even more
promising if they add, “I never really understood it before.” ™ ([8] page 231.)
153
Please send corrections however trivial to twk@dpmms.cam.ac.uk

We now observe that L(Rm , Rp ) is a ¬nite dimensional vector space over R
of dimension mp, in other words, L(Rm , Rp ) can be identi¬ed with Rmp . We
know how to de¬ne the derivative of a well behaved function g : U ’ Rmp
at x as a function

Dg(x) ∈ L(Rm , Rmp )

so we know how to de¬ne the derivative of Df at x as a function

D(Df )(x) ∈ L(Rm , L(Rm , Rp )).

We have thus shown how to de¬ne the second derivative D 2 f (x) = D(Df )(x).
But D2 f (x) lies in L(Rm , L(Rm , Rp )) and ±2 lies in the space E(Rm , Rm ; Rp )
of bilinear maps from Rm — Rm to Rp . How, the reader may ask, can we
identify L(Rm , L(Rm , Rp )) with E(Rm , Rm ; Rp )? Fortunately this question
answers itself with hardly any outside intervention.
Exercise 7.2.10. Let E, F and G be ¬nite dimensional vector spaces over
R. We write E(E, F ; G) for the space of bilinear maps ± : E — F ’ G.
De¬ne

(˜(±)(u))(v) = ±(u, v)

for all ± ∈ E(E, F ; G), u ∈ E and v ∈ F .
(i) Show that ˜(±)(u) ∈ L(F, G).
(ii) Show that, if v is ¬xed,

˜(±)(»1 u1 + »2 u2 ) (v) = »1 ˜(±)(u1 ) + »2 ˜(±)(u2 ) (v)

and deduce that

˜(±)(»1 u1 + »2 u2 ) = »1 ˜(±)(u1 ) + »2 ˜(±)(u2 )

for all »1 , »2 ∈ R and u1 , u2 ∈ E. Conclude that ˜(±) ∈ L(E, L(F, G)).
(iii) By arguments similar in spirit to those of (ii), show that ˜ : E(E, F ; G) ’
L(E, L(F, G)) is linear.
(iv) Show that if (˜(±)(u))(v) = 0 for all u ∈ E, v ∈ F , then ± = 0.
Deduce that ˜ is injective.
(v) By computing the dimensions of E(E, F ; G) and L(E, L(F, G)), show
that ˜ is an isomorphism.
Since our de¬nition of ˜ does not depend on a choice of basis, we say that
˜ gives a natural isomorphism of E(E, F ; G) and L(E, L(F, G)). If we use
this isomorphism to identify E(E, F ; G) and L(E, L(F, G)) then D 2 f (x) ∈
154 A COMPANION TO ANALYSIS

E(Rm , Rm ; Rp ). If we treat the higher derivatives in the same manner, the
central formula of the local Taylor theorem takes the satisfying form
12 1
D f (x)(h, h) + · · · + Dn f (x)(h, h, . . . , h) + (h) h n .
f (x + h) = f (x) + Df (x)(h) +
2! n!
For more details, consult sections 11 and 13 of chapter VIII of Dieudonn´™s
e
Foundations of Modern Analysis [13] where the higher derivatives are dealt
with in a coordinate free way. Like Hardy™s book [23], Dieudonn´™s is ae
4
masterpiece but in very di¬erent tradition .


7.3 Critical points
In this section we mix informal and formal argument, deliberately using
words like ˜well behaved™ without de¬ning them. Our object is to use the
local Taylor formula to produce results about maxima, minima and related
objects.
Let U be an open subset of Rm containing 0. We are interested in the
behaviour of a well behaved function f : U ’ R near 0.
Since f is well behaved, the ¬rst order local Taylor theorem (which re-
duces to the de¬nition of di¬erentiation) gives

f (h) = f (0) + ±h + (h) h

where (h) ’ 0 as h ’ 0 and ± = Df (0) is a linear map from Rm to R.
By a very simple result of linear algebra, we can choose a set of orthogonal
coordinates so that ±(x1 , x2 , . . . , xm ) = ax1 with a ≥ 0.

Exercise 7.3.1. If ± : Rm ’ R is linear show that, with respect to any
particular chosen orthogonal coordinates,

±(x1 , x2 , . . . , xm ) = a1 x1 + a2 x2 + · · · + am xm

for some aj ∈ R. Deduce that there is a vector a such that ±x = a · x for all
x ∈ Rm . Conclude that we can choose a set of orthogonal coordinates so that
±(x1 , x2 , . . . , xm ) = ax1 with a ≥ 0.
In applied mathematics we write a = f . A longer, but very instructive
proof, of the result of this exercise is given in Exercise K.31.

In the coordinate system just chosen

f (h1 , h2 , . . . , hm ) = f (0) + ah1 + (h) h
155
Please send corrections however trivial to twk@dpmms.cam.ac.uk




Figure 7.1: Contour lines when the derivative is not zero.

where (h) ’ 0 as h ’ 0. Thus, speaking informally, if a = 0 the ˜contour
lines™ f (h) = c close to 0 will look like parallel ˜hyperplanes™ perpendicular to
the x1 axis. Figure 7.1 illustrates the case m = 2. In particular, our contour
lines look like those describing a side of a hill but not its peak.
Using our informal insight we can prove a formal lemma.

Lemma 7.3.2. Let U be an open subset of Rm containing x. Suppose that
f : U ’ R is di¬erentiable at x. If f (x) ≥ f (y) for all y ∈ U then
Df (x) = 0 (more precisely, Df (x)h = 0 for all h ∈ Rm ).

Proof. There is no loss in generality in supposing x = 0. Suppose that
Df (0) = 0. Then we can ¬nd an orthogonal coordinate system and a strictly
positive real number a such that Df (0)(h1 , h2 , . . . , hn ) = ah1 . Thus, from
the de¬nition of the derivative,

f (h1 , h2 , . . . , hn ) = f (0) + ah1 + (h) h

where (h) ’ 0 as h ’ 0.
Choose · > 0 such that, whenever h < ·, we have h ∈ U and (h) <
a/2. Now choose any real h with 0 < h < ·. If we set h = (h, 0, 0, . . . , 0), we
have

f (h) = f (0) + ah + (h)h > f (0) + ah ’ ah/2 = f (0) + ah/2 > f (0).



The distinctions made in the following de¬nition are probably familiar to
the reader.

De¬nition 7.3.3. Let E be a subset of Rm containing x and let f be a
function from E to R.
4
See the quotation from Boas in the previous footnote.
156 A COMPANION TO ANALYSIS

(i) We say that f has a global maximum at x if f (x) ≥ f (y) for all
y ∈ E.
(ii) We say that f has a strict global maximum at x if f (x) > f (y) for
all y ∈ E with x = y.
(iii) We say that f has a local maximum (respectively a strict local maxi-
mum) at x if there exists an · > 0 such that the restriction of f to E ©B(x, ·)
has a global maximum (respectively a strict global maximum) at x.
(iv) If we can ¬nd an · > 0 such that E ⊇ B(x, ·) and f is di¬erentiable
at x with Df (x) = 0, we say that x is a critical or stationary point 5 of f .
It is usual to refer to the point x where f takes a (global or local) maxi-
mum as a (global or local) maximum and this convention rarely causes con-
fusion. When mathematicians omit the words local or global in referring to
maximum they usually mean the local version (but this convention, which I
shall follow, is not universal).
Here are some easy exercises involving these ideas.
Exercise 7.3.4. (i) Let U be an open subset of Rm containing x. Suppose
that f : U ’ R is di¬erentiable on U and that Df is continuous at x. Show
that, if f has a local maximum at x, then Df (x) = 0 .
(ii) Suppose that f : Rm ’ R is di¬erentiable everywhere and E is a
closed subset of Rm containing x. Show that, even if x is a global maximum
of the restriction of f to E, it need not be true that Df (x) = 0. [Hint: We
have already met this fact when we thought about Rolle™s theorem.] Explain
informally why the proof of Lemma 7.3.2 fails in this case.
(iii) State the de¬nitions corresponding to De¬nition 7.3.3 that we need
to deal with minima.
(iv) Let E be any subset of Rm containing y and let f be a function from
E to R. If y is both a global maximum and a global minimum for f show that
f is constant. What can you say if we replace the word ˜global™ by ˜local™ ?
We saw above how f behaved locally near 0 if Df (0) = 0. What can we
say if Df (0) = 0? In this case, the second order Taylor expansion gives
2
f (h) = f (0) + β(h, h) + (h) h
where
m m
1
β(h, h) = f,ij (0)hi hj
2 i=1 j=1
5
In other words a stationary point is one where the ground is ¬‚at. Since ¬‚at ground
drains badly, the stationary points we meet in hill walking tend to be boggy. Thus we
encounter boggy ground at the top of hills and when crossing passes as well as at lowest
points (at least in the UK, other countries may be drier or have better draining soils).
157
Please send corrections however trivial to twk@dpmms.cam.ac.uk




Figure 7.2: Contour lines when the derivative is zero but the second derivative
is non-singular

and (h) ’ 0 as h ’ 0. We write β = 1 D2 f and call the matrix
2
K = (f,ij (0)) the Hessian matrix. As we noted in the previous section,
the symmetry of the second partial derivatives (Theorem 7.2.6) tells us that
the Hessian matrix is a symmetric matrix and the associated bilinear map
D2 f is symmetric. It follows from a well known result in linear algebra (see
e.g. Exercise K.30) that Rn has an orthonormal basis of eigenvectors of K.
Choosing coordinate axes along those vectors, we obtain
m
2
»i h 2
D f (h, h) = i
i=1

where the »i are the eigenvalues associated with the eigenvectors.
In the coordinate system just chosen
m
1
»i h2 + (h) h 2
f (h1 , h2 , . . . , hm ) = f (0) + i
2 i=1

where (h) ’ 0 as h ’ 0. Thus, speaking informally, if all the »i are
non-zero, the ˜contour lines™ f (h) = c close to 0 will look like ˜quadratic hy-
persurfaces™ (that is m dimensional versions of conics). Figure 7.2 illustrates
the two possible contour patterns when m = 2. The ¬rst type of pattern is
that of a summit (if the contour lines are for increasing heights as we ap-
proach 0) or a bottom (lowest point)6 (if the contour lines are for decreasing
heights as we approach 0). The second is that of a pass (often called a sad-
dle). Notice that, for merchants, wishing to get from one valley to another,
the pass is the highest point in their journey but, for mountaineers, wishing
to get from one mountain to another, the pass is the lowest point.
6
The English language is rich in synonyms for highest points (summits, peaks, crowns,
. . . ) but has few for lowest points. This may be because the English climate ensures that
most lowest points are under water.
158 A COMPANION TO ANALYSIS

When looking at Figure 7.2 it is important to realise that the di¬erence
in heights of successive contour lines is not constant. In e¬ect we have drawn
contour lines at heights f (0), f (0)+·, f (0)+22 ·, f (0)+32 ·, . . . , f (0)+n2 ·.

Exercise 7.3.5. (i) Redraw Figure 7.2 with contour lines at heights f (0),
f (0) + ·, f (0) + 2·, f (0) + 3·, . . . , f (0) + n·.
(ii) What (roughly speaking) can you say about the di¬erence in heights
of successive contour lines in Figure 7.1?

Using our informal insight we can prove a formal lemma.

Lemma 7.3.6. Let U be an open subset of Rm containing x. Suppose that f :
U ’ R has second order partial derivatives on U and these partial derivatives
are continuous at x. If Df (x) = 0 and D 2 f (x) is non-singular then
(i) f has a minimum at x if and only if D 2 f (x) is positive de¬nite.
(ii) f has a maximum at x if and only if D 2 f (x) is negative de¬nite.

The conditions of the second sentence of the hypothesis ensure that we
have a local second order Taylor expansion. In most applications f will be
much better behaved than this. We say that D 2 f (x) is positive de¬nite if all
the associated eigenvalues (that is all the eigenvalues of the Hessian matrix)
are strictly positive and that D 2 f (x) is negative de¬nite if all the associated
eigenvalues are strictly negative.

Exercise 7.3.7. Prove Lemma 7.3.6 following the style of the proof of Lemma 7.3.2.

It is a non-trivial task to tell whether a given Hessian is positive or neg-
ative de¬nite.

Exercise 7.3.8. Let f (x, y) = x2 + 6xy + y 2 . Show that Df (0, 0) = 0, that
all the entries in the Hessian matrix K at (0, 0) are positive and that K
is non-singular but that D 2 f (0, 0) is neither positive de¬nite nor negative
de¬nite. (So (0, 0) is a saddle point.)

Exercise K.105 gives one method of resolving the problem.
Because it is non-trivial to use the Hessian to determine whether a sin-
gular point, that is a point x where Df (x) = 0 is a maximum, a minimum
or neither, mathematicians frequently seek short cuts.

Exercise 7.3.9. Suppose that f : Rm ’ R is continuous, that f (x) ’ 0 as
x ’ ∞ and that f (x) > 0 for all x ∈ Rm .
(i) Explain why there exists an R > 0 such that f (x) < f (0) for all
x ≥ R.
159
Please send corrections however trivial to twk@dpmms.cam.ac.uk

(ii) Explain why there exists an x0 with x0 ¤ R and f (x0 ) ≥ f (x) for
all x ¤ R.
(iii) Explain why f (x0 ) ≥ f (x) for all x ∈ Rm .
(iv) If f is everywhere di¬erentiable and has exactly one singular point
y0 show that f attains a global maximum at y0 .
(v) In statistics we frequently wish to maximise functions of the form

<< . .

. 18
( : 70)



. . >>