g(c) c

4

Arnol´d calls it the Newton-Leibniz-Gauss-Green-Ostrogradskii-Stokes-Poincar´ the-

e

orem but most mathematicians call it the generalised Stokes™ theorem or just Stokes™

theorem.

188 A COMPANION TO ANALYSIS

Exercise 8.3.14. (i) Prove Theorem 8.3.13 by considering

g(t) t

f (s) ds ’

U (t) = f (g(x))g (x) dx.

g(c) c

(ii) Derive Theorem 8.3.11 from Theorem 8.3.13 by choosing f appropri-

ately.

(iii) Strengthen Theorem 8.3.13 along the lines of Exercise 8.3.12.

(iv) (An alternative proof.) If f is as in Theorem 8.3.13 explain why we

can ¬nd an F : (±, β) ’ R with F = f . Obtain Theorem 8.3.13 by applying

the chain rule to F (g(x))g (x) = f (g(x))g (x).

Because the proof of Theorem 8.3.13 is so simple and because the main use

of the result in elementary calculus is to evaluate integrals, there is tendency

to underestimate the importance of this result. However, it is important for

later developments that the reader has an intuitive grasp of this result.

Exercise 8.3.15. (i) Suppose that f : R ’ R is the constant function f (t) =

K and that g : R ’ R is the linear function g(t) = »t + µ. Show by direct

calculation that

g(d) d

f (s) ds = f (g(x))g (x) dx,

g(c) c

and describe the geometric content of this result in words.

(ii) Suppose now that f : R ’ R and g : R ’ R are well behaved

functions. By splitting [c, d] into small intervals on which f is ˜almost con-

stant™ and g is ˜almost linear™, give a heuristic argument for the truth of

Theorem 8.3.13. To see how this heuristic argument can be converted into a

rigorous one, consult Exercise K.118.

Exercise 8.3.16. There is one peculiarity in our statement of Theorem 8.3.13

which is worth noting. We do not demand that g be bijective. Suppose that

f : R ’ R is continuous and g(t) = sin t. Show that, by choosing di¬erent

intervals (c, d), we obtain

sin ± ±

f (s) ds = f (sin x) cos x dx

0 0

±+2π π’±

= f (sin x) cos x dx = f (sin x) cos x dx.

0 0

Explain what is going on.

The extra ¬‚exibility given by allowing g not be bijective is one we are

usually happy to sacri¬ce in the interests of generalising Theorem 8.3.13.

189

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Exercise 8.3.17. The following exercise is traditional.

(i) Show that integration by substitution, using x = 1/t, gives

b 1/a

dx dt

=

1 + x2 1 + t2

a 1/b

when b > a > 0.

(ii) If we set a = ’1, b = 1 in the formula of (i), we obtain

1 1

dx ? dt

=’

1 + x2 1 + t2

’1 ’1

Explain this apparent failure of the method of integration by substitution.

(iii) Write the result of (i) in terms of tan’1 and prove it using standard

trigonometric identities.

In sections 5.4 and 5.6 we gave a treatment of the exponential and loga-

rithmic functions based on di¬erentiation. The reader may wish to look at

Exercise K.126 in which we use integration instead.

Another result which can be proved in much the same manner as Theo-

rems 8.3.11 and Theorem 8.3.13 is the lemma which justi¬es integration by

parts. (Recall the notation [h(x)]b = h(b) ’ h(a).)

a

Lemma 8.3.18. Suppose that f : (±, β) ’ R has continuous derivative and

g : (±, β) ’ R is continuous. Let G : (±, β) ’ R be an inde¬nite integral of

g. Then, if [a, b] ⊆ (±, β), we have

b b

[f (x)G(x)]b ’

f (x)g(x) dx = f (x)G(x) dx.

a

a a

Exercise 8.3.19. (i) Obtain Lemma 8.3.18 by di¬erentiating an appropriate

U in the style of the proofs of Theorems 8.3.11 and Theorem 8.3.13. Quote

carefully the results that you use.

(ii) Obtain Lemma 8.3.18 by integrating both sides of the equality (uv) =

u v + uv and choosing appropriate u and v. Quote carefully the results that

you use.

(iii) Strengthen Lemma 8.3.18 along the lines of Exercise 8.3.12.

Integration by parts gives a global Taylor theorem with a form that is

easily remembered and proved for examination.

Theorem 8.3.20. (A global Taylor™s theorem with integral remain-

der.) If f : (u, v) ’ R is n times continuously di¬erentiable and 0 ∈ (u, v),

then

n’1

f (j) (0) j

f (t) = t + Rn (f, t)

j!

j=0

190 A COMPANION TO ANALYSIS

where

t

1

(t ’ x)n’1 f (n) (x) dx.

Rn (f, t) =

(n ’ 1)! 0

Exercise 8.3.21. By integration by parts, show that

f (n’1) (0) n’1

Rn (f, t) = t + Rn’1 (f, t).

(n ’ 1)!

Use repeated integration by parts to obtain Theorem 8.3.20.

Exercise 8.3.22. Reread Example 7.1.5. If F is as in that example, identify

Rn’1 (F, t).

Exercise 8.3.23. If f : (’a, a) ’ R is n times continuously di¬erentiable

with |f (n) (t)| ¤ M for all t ∈ (’a, a), show that

n’1

f (j) (0) j M |t|n

f (t) ’ t¤ .

j! n!

j=0

Explain why this result is slightly weaker than that of Exercise 7.1.1 (v).

There are several variants of Theorem 8.3.20 with di¬erent expressions for

Rn (f, t) (see, for example, Exercise K.49 (vi)). However, although the theory

of the Taylor expansion is very important (see, for example, Exercise K.125

and Exercise K.266), these global theorems are not much used in relation to

speci¬c functions outside the examination hall. We discuss two of the reasons

why at the end of Section 11.5. In Exercises 11.5.20 and 11.5.22 I suggest

that it is usually easier to obtain Taylor series by power series solutions rather

than by using theorems like Theorem 8.3.20. In Exercise 11.5.23 I suggest

that power series are often not very suitable for numerical computation.

First steps in the calculus of variations ™

8.4

The most famous early problem in the calculus of variations is that of the

brachistochrone. It asks for the equation y = f (x) of the wire down which a

frictionless particle with initial velocity v will slide from one point (a, ±) to

another (b, β) (so f (a) = ±, f (b) = β, a = b and ± > β) in the shortest time.

It turns out that that time taken by the particle is

1/2

b

1 + f (x)2

1

J(f ) = dx

(2g)1/2 κ ’ f (x)

a

where κ = v 2 /(2g) + ± and g is the acceleration due to gravity.

191

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Exercise 8.4.1. If you know su¬cient mechanics, verify this. (Your argu-

ment will presumably involve arc length which has not yet been mentioned in

this book.)

This is a problem of minimising which is very di¬erent from those dealt

with in elementary calculus. Those problems ask us to choose a point x0 from

a one-dimensional space which minimises some function g(x). In section 7.3

we considered problems in which we sought to choose a point x0 from a

n-dimensional space which minimises some function g(x). Here we seek to

choose a function f0 from an in¬nite dimensional space to minimise a function

J(f ) of functions f .

Exercise 8.4.2. In the previous sentence we used the words ˜in¬nite dimen-

sional™ somewhat loosely. However we can make precise statements along the

same lines.

(i) Show that the collection P of polynomials P with P (0) = P (1) = 0

forms a vector space over R with the obvious operations. Show that P is

in¬nite dimensional (in other words, has no ¬nite spanning set).

(ii) Show that the collection E of in¬nitely di¬erentiable functions f :

[0, 1] ’ R with f (0) = f (1) forms a vector space over R with the obvious

operations. Show that E is in¬nite dimensional.

John Bernoulli published the brachistochrone problem as a challenge in

1696. Newton, Leibniz, L™Hˆpital, John Bernoulli and James Bernoulli all

o

found solutions within a year5 . However, it is one thing to solve a particular

problem and quite another to ¬nd a method of attack for the general class

of problems to which it belongs. Such a method was developed by Euler

and Lagrange. We shall see that it does not resolve all di¬culties but it

represents a marvelous leap of imagination.

We begin by proving that, under certain circumstances, we can inter-

change the order of integration and di¬erentiation. (We will extend the

result in Theorem 11.4.21.)

Theorem 8.4.3. (Di¬erentiation under the integral.) Let (a , b ) —

(c , d ) ⊇ [a, b] — [c, d]. Suppose that g : (a , b ) — (c , d ) ’ R is continuous

and that the partial derivative g,2 exists and is continuous. Then writing

b

G(y) = a g(x, y) dx we have G di¬erentiable on (c, d) with

b

G (y) = g,2 (x, y) dx.

a

5

They were giants in those days. Newton had retired from mathematics and submitted

his solution anonymously. ˜But™ John Bernoulli said ˜one recognises the lion by his paw.™

192 A COMPANION TO ANALYSIS

This result is more frequently written as

b b

d ‚g

g(x, y) dx = (x, y) dx,

dy ‚y

a a

and interpreted as ˜the d clambers through the integral and curls up™. If we

use the D notation we get

b

G (y) = D2 g(x, y) dx.

a

b

It may, in the end, be more helpful to note that a g(x, y) dx is a function of

the single variable y, but g(x, y) is a function of the two variables x and y.

Proof. We use a proof technique which is often useful in this kind of situation

(we have already used a simple version in Theorem 8.3.6, when we proved

the fundamental theorem of the calculus).

We ¬rst put everything under one integral sign. Suppose y, y + h ∈ (c, d)

and h = 0. Then

b b

G(y + h) ’ G(y) 1

’ G(y + h) ’ G(y) ’

g,2 (x, y) dx = hg,2 (x, y) dx

|h|

h a a

b

1

g(x, y + h) ’ g(x, y) ’ hg,2 (x, y) dx

=

|h| a

In order to estimate the last integral we use the simple result (Exercise 8.2.13 (iv))

|integral| ¤ length — sup

which gives us

b

1

g(x, y + h) ’ g(x, y) ’ hg,2 (x, y) dx

|h| a

b’a

¤ sup |g(x, y + h) ’ g(x, y) ’ hg,2 (x, y)|.

|h| x∈[a,b]

We expect |g(x, y +h)’g(x, y)’hg,2 (x, y)| to be small when h is small be-

cause the de¬nition of the partial derivative tells us that g(x, y+h)’g(x, y) ≈

hg,2 (x, y). In such circumstances, the mean value theorem is frequently use-

ful. In this case, setting f (t) = g(x, y + t) ’ g(x, y), the mean value theorem

tells us that

|f (h)| = |f (h) ’ f (0)| ¤ |h| sup |f (θh)|

0¤θ¤1

193

Please send corrections however trivial to twk@dpmms.cam.ac.uk

and so

|g(x, y + h) ’ g(x, y) ’ hg,2 (x, y)| ¤ |h| sup |g,2 (x, y + θh) ’ g,2 (x, y)|.

0¤θ¤1

There is one further point to notice. Since we are taking a supremum

over all x ∈ [a, b], we shall need to know, not merely that we can make

|g,2 (x, y + θh) ’ g,2 (x, y)| small at a particular x by taking h su¬ciently

small, but that we can make |g,2 (x, y + θh) ’ g,2 (x, y)| uniformly small for

all x. However, we know that g,2 is continuous on [a, b] — [c, d] and that a

function which is continuous on a closed bounded set is uniformly continuous

and this will enable us to complete the proof.

Let > 0. By Theorem 4.5.5, g,2 is uniformly continuous on [a, b] — [c, d]

and so we can ¬nd a δ( ) > 0 such that

|g,2 (x, y) ’ g,2 (u, v)| ¤ /(b ’ a)

whenever (x’u)2 +(y ’v)2 < δ( ) and (x, y), (u, v) ∈ [a, b]—[c, d]. It follows

that, if y, y + h ∈ (c, d) and |h| < δ( ), then

sup |g,2 (x, y + θh) ’ g,2 (x, y)| ¤ /(b ’ a)

0¤θ¤1

for all x ∈ [a, b]. Putting all our results together, we have shown that

b

G(y + h) ’ G(y)

’ g,2 (x, y) dx <

h a

whenever y, y + h ∈ (c, d) and 0 < |h| < δ( ) and the result follows.

Exercise 8.4.4. Because I have tried to show where the proof comes from,

the proof above is not written in a very economical way. Rewrite it more

economically.

A favourite examiner™s variation on the theme of Theorem 8.4.3 is given in

Exercise K.132.

Exercise 8.4.5. In what follows we will use a slightly di¬erent version of

Theorem 8.4.3.

Suppose g : [a, b] — [c, d] is continuous and that the partial derivative g ,2

b

exists and is continuous. Then, writing G(y) = a g(x, y) dx, we have G

di¬erentiable on [c, d] with

b

G (y) = g,2 (x, y) dx.

a

Explain what this means in terms of left and right derivatives and prove

it.

194 A COMPANION TO ANALYSIS

The method of Euler and Lagrange applies to the following class of prob-

lems. Suppose that F : R3 ’ R has continuous second partial derivatives.

We consider the set A of functions f : [a, b] ’ R which are di¬erentiable

with continuous derivative and are such that f (a) = ± and f (b) = β. We

write

b

J(f ) = F (t, f (t), f (t)) dt.

a

and seek to minimise J, that is to ¬nd an f0 ∈ A such that

J(f0 ) ¤ J(f )

whenever f ∈ A.

In section 7.3, when we asked if a particular point x0 from an n-dimensional

space minimised g : Rn ’ R, we examined the behaviour of g close to x0 . In

other words, we looked at g(x0 + ·u) when u was an arbitrary vector and ·

was small. The idea of Euler and Lagrange is to look at

Gh (·) = J(f0 + ·h)

where h : [a, b] ’ R is di¬erentiable with continuous derivative and is such

that h(a) = 0 and h(b) = 0 (we shall call the set of such functions E). We

observe that Gh is a function from R and that Gh has a minimum at 0 if J

is minimised by f0 . This observation, combined with some very clever, but

elementary, calculus gives the celebrated Euler-Lagrange equation.

Theorem 8.4.6. Suppose that F : R3 ’ R has continuous second partial

derivatives. Consider the set A of functions f : [a, b] ’ R which are di¬er-

entiable with continuous derivative and are such that f (a) = ± and f (b) = β.

We write

b

J(f ) = F (t, f (t), f (t)) dt.

a

If f ∈ A such that

J(f ) ¤ J(g)

whenever g ∈ A then

d

F,2 (t, f (t), f (t)) = F,3 (t, f (t), f (t)).

dt

195

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Proof. We use the notation of the paragraph preceding the statement of

the theorem. If h ∈ E (that is to say h : [a, b] ’ R is di¬erentiable with

continuous derivative and is such that h(a) = 0 and h(b) = 0) then the chain

rule tells us that the function gh : R2 ’ R given by

gh (·, t) = F (t, f (t) + ·h(t), f (t) + ·h (t))

has continuous partial derivative

gh,1 (·, t) = h(t)F,2 (t, f (t) + ·h(t), f (t) + ·h (t)) + h (t)F,3 (t, f (t) + ·h(t), f (t) + ·h (t)).

Thus by Theorem 8.4.3, we may di¬erentiate under the integral to show that

Gh is di¬erentiable everywhere with

Gh (·) =

b

h(t)F,2 (t, f (t) + ·h(t), f (t) + ·h (t)) + h (t)F,3 (t, f (t) + ·h(t), f (t) + ·h (t)) dt.

a

If f minimises J, then 0 minimises Gh and so Gh (0) = 0. We deduce that

b

0= h(t)F,2 (t, f (t), f (t)) + h (t)F,3 (t, f (t), f (t)) dt

a

b b

= h(t)F,2 (t, f (t), f (t)) dt + h (t)F,3 (t, f (t), f (t)) dt.

a a

Using integration by parts and the fact that h(a) = h(b) = 0 we obtain

b b

d

b

’

h (t)F,3 (t, f (t), f (t)) dt = [h(t)F,3 (t, f (t), f (t))]a h(t) F,3 (t, f (t), f (t)) dt

dt

a a

b

d

=’ h(t) F,3 (t, f (t), f (t)) dt.

dt

a

Combining the results of the last two sentences, we see that

b

d

h(t) F,2 (t, f (t), f (t)) ’

0= F,3 (t, f (t), f (t)) dt.

dt

a

Since this result must hold for all h ∈ A, we see that

d

F,2 (t, f (t), f (t)) ’ F,3 (t, f (t), f (t)) = 0

dt

for all t ∈ [a, b] (for details see Lemma 8.4.7 below) and this is the result we

set out to prove.