Single and Multivariable Calculus

(1)

Single and Multivariable Calculus

(2)

Text Book (Calculus)

Mathematics for Machine Learning

• https://mml-book.github.io/

• https://github.com/vbartle/MML-Companion

Table of Contents:

5 Vector Calculus 139

5.1 Differentiation of Univariate Functions 141 5.2 Partial Differentiation and Gradients 146 5.3 Gradients of Vector-Valued Functions 149 5.4 Gradients of Matrices 155

5.5 Useful Identities for Computing Gradients 158

5.6 Backpropagation and Automatic Differentiation 159 5.7 Higher-Order Derivatives 164

5.8 Linearization and Multivariate Taylor Series 165 5.9 Further Reading

(3)

• Function

• 5.1 Differentiation of Univariate Functions

• 5.1.2 Differentiation Rules

• Linear Approximations and Differentials

• 5.1.1 Taylor Series

• Newton’s Method

• What Derivatives Tell Us about the Shape of a Graph

Outline

(6)

A function f is a quantity that relates two quantities to each other. In this book, these quantities are typically inputs and targets (function values) f(x), which we assume are real-valued if not stated otherwise. Here is the

domain of f, and the function values f(x) are the image/codomain of f.

Function

to specify a function, where (5.1a) specifies that f is a mapping from to and (5.1b) specifies the explicit assignment of an input x to a function value f(x). A function f assigns every input x exactly one function value f(x).

We often write

(7)

Example 5.1

Recall the dot product as a special case of an inner product. In the previous notation, the function , would be specified as

In this chapter, we will discuss how to compute gradients of functions, which is often essential to facilitate learning in machine learning models

since the gradient points in the direction of steepest ascent. Therefore, vector

(8)

5.1 Differentiation of Univariate Functions

(9)

Differentiation

Definition 5.1 (Difference Quotient). The difference quotient

computes the slope of the secant line through two points on the graph of f. In Figure 5.3, these are the points with x-

coordinates and .

In the limit for , we obtain the tangent of f at x, if f is differentiable. The tangent is then the derivative of f at x.

Definition 5.2 (Derivative). More formally, for h > 0 the derivative of f derivative

(10)

The Tangent Problem (1 of 3)

The word tangent is derived from the Latin word tangens, which means “touching.”

For a circle we could simply follow Euclid and say that a

tangent is a line ℓ that intersects the circle once and only once,

as in Figure 1(a). Figure 1(a)

For more complicated curves this definition is inadequate.

Figure 1(b)

We can think of a tangent to a curve as a line that touches the curve and follows the same direction as the curve at the point of contact. How can this idea be made precise?

Figure 1(b) shows a line ℓ that appears to be a tangent to the curve C at point P, but it intersects C twice.

(11)

Example 1

Find an equation of the tangent line to the parabola y  x² at the point P(1, 1).

Solution:

We will be able to find an equation of the tangent line ℓ as soon as we know its slope m.

The difficulty is that we know only one point, P, on ℓ, whereas we need two points to compute the slope.

But observe that we can compute an approximation to m by choosing a nearby point ^{Q x x}



^, ²



(as in Figure 2) and computing the slope m_PQ of the secant line PQ. (A secant line, from the Latin word

on the parabola

(12)

Example 1 – Solution (1 of 2)

We choose x ≠ 1 so that Q ≠ P. Then

2 1

PQ 1 m x

x

 



For instance, for the point Q(1.5, 2.25) we have 2.25 1

1.5 1 1.25

0.5 2.5

mPQ 

 



x m_PQ

2 3

1.5 2.5

1.1 2.1

1.01 2.01

1.001 2.001

x m_PQ

0 1

0.5 1.5

0.9 1.9

0.99 1.99

0.999 1.999

The tables in the margin show the values of m_PQ for several values of x close to 1.

The closer Q is to P, the closer x is to 1 and, it appears from the tables, the closer m is to 2.

(13)

Example 1 – Solution (2 of 2)

This suggests that the slope of the tangent line ℓ should be m = 2.

We say that the slope of the tangent line is the limit of the slopes of the secant lines, and we express this symbolically by writing

2 1

lim and lim 1 2

PQ 1

Q P x

m m x

  x

  



Assuming that the slope of the tangent line is indeed 2, we use the point-slope form of the equation of a line [y − y₁ = m(x − x₁)] to write the equation of the tangent line through (1, 1) as

 

1 2 1 or 2 1

y   x  y  x 

(14)

Example 1

Find an equation of the tangent line to the parabola y  x² at the point P(1, 1).

Solution:

Figure 2

2 1

PQ 1 m x

x

 



2 1

lim and lim 1 2

PQ 1

Q P x

m m x

  x

  



(15)

The Tangent Problem (2 of 3)

Figure 3 illustrates the limiting process that occurs in Example 1.

Figure 3

Q approaches P from the right

(16)

The Tangent Problem (3 of 3)

Figure 3

Q approaches P from the left

As Q approaches P along the parabola, the corresponding secant lines rotate about P and approach the tangent line ℓ.

(17)

5.1.2 Differentiation Rules

(18)

Constant Functions

Let’s start with the simplest of all functions, the constant function f(x) = c.

The graph of this function is the horizontal line y = c, which has slope 0, so we must have

( ) 0.

f x  (See Figure 1.)

Figure 1

The graph of f(x) = cis the line y= c, so f x( )0.

A formal proof, from the definition of a derivative, is also easy:

     

0 0 0

lim lim lim 0 0

h h h

f x h f x c c

f x ^ h ^ h ^

  

    

In Leibniz notation, we write this rule as follows.

Derivative of a Constant Function

 

⁰

d c 

(19)

Power Functions (1 of 3)

We next look at the functions f x( )  xⁿ, where n is a positive integer.

If n = 1, the graph of f(x) = x is the line y = x, which has slope 1. (See Figure 2.)

Figure 2

The graph of f(x) = xis the line y= x, so f x( )1.

So ^d

 

^x ¹

dx  1

(You can also verify Equation 1 from the definition of a derivative.) We have already investigated the cases n = 2 and n = 3. We found that

(20)

(21)

Power Functions (2 of 3)

For n = 4 we find the derivative of ^{f x}

 

^ ^x⁴ as follows:

     

 

0

4 4

0

4 3 2 2 3 4 4

0

3 2 2 3 4

0

3 2 2 3

0

lim

4 6 4

lim

4 6 4

lim

lim 4 6 4

h

f x h f x

f x h

x h x

h

x x h x h xh h x

h

x h x h xh h h

x x h xh h



 

 

 



    



  



   

(22)

Power Functions (3 of 3)

³ _dx^d

 

^x⁴ ^ ⁴^x³

Comparing the equations in (1), (2), and (3), we see a pattern emerging.

It seems to be a reasonable guess that, when n is a positive integer,



^{d dx}^/

  

^xⁿ ^ ^nxⁿ^¹^.

This turns out to be true.

The Power Rule If n is a positive integer, then _dx^d

 

^xⁿ ^ ^nxⁿ^¹

The Power Rule (General Version) If n is any real number, then

 

ⁿ ⁿ ¹

d x nx dx

 

The Power Rule enables us to find tangent lines without having to resort to the definition of a derivative. It also enables us to find normal lines.

The normal line to a curve C at a point P is the line through P that is perpendicular

(23)

Example 1

(a) If ^{f x}

 

^ ^x⁶^,then^{f x}^

 

^ ⁶^x⁵^.

(b) If y  x¹⁰⁰⁰, then y=1000x⁹⁹⁹. (c) If ^y ^t⁴^{, then} ^dy ^{4 .}^t³

 dt 

(d) _dr^d

 

^r³ ^ ³^r ²

(24)

New Derivatives from Old (1 of 2)

When new functions are formed from old functions by addition, subtraction, or multiplication by a constant, their derivatives can be calculated in terms of

derivatives of the old functions.

In particular, the following formula says that the derivative of a constant times a function is the constant times the derivative of the function.

The Constant Multiple Rule If c is a constant and f is a differentiable function, then ^d ^{cf x}

 

^c ^d ^{f x}

 

dx    dx ^lim

 

^lim

 

x a cf x c x af x

    

(25)

Example 4

(a)

   

 

4 4

3 3

3 4 12

d d

x x

dx dx

x x



(b)

   

 

1 1

d d

x x

dx dx

d x dx

 

    

 

(26)

New Derivatives from Old (2 of 2)

The next rule tells us that the derivative of a sum (or difference) of functions is the sum (or difference) of the derivatives.

The Sum and Difference Rules If f and g are both differentiable, then

       

d d d

f x g x f x g x

dx     dx  dx

       

    

 

d d d

f x g x f x g x

dx dx dx

The Sum Rule can be extended to the sum of any number of functions. For instance, using this theorem twice, we get



^f ^{ }^g ^h



^{ } 



^f ^ ^g



^ ^h     ⁽^f ^g⁾ ^h   ^f ^g  ^h

The Constant Multiple Rule, the Sum Rule, and the Difference Rule can be combined with the Power Rule to differentiate any polynomial.

       

lim lim lim

x a f x g x x af x x ag x

       

       

lim lim lim

x a f x g x x af x x ag x

       

(27)

Differentiation Rules

The Sum Rules ^d ^{f x}

 

^{g x}

 

^d ^{f x}

 

^d ^{g x}

 

dx     dx  dx

Derivative of a Constant Function ^d

 

^c ⁰

dx 

The Power Rule (General Version) If n is any real number, then _dx^d

 

^xⁿ ^ ^nxⁿ^¹

   

d d

cf x c f x dx    dx

The Constant Multiple Rule

If c is a constant and f is a differentiable function, then

The Product Rule ^d ^{f x g x}

   

^{f x}

 

^d ^{g x}

 

^{g x}

 

^d ^{f x}

 

dx    dx    dx  

If f and g are both differentiable, then

   

       

 

²

d d

g x f x f x g x

d f x dx dx

dx g x

    

     

  

 

The Quotient Rule

(28)

(29)

Example 5.5

(30)

Derivative of the Exponential Function

_dx^d

 

^e^x ^ ^e^x



^ln



^ ¹

d x

dx x

2



^log



^ ¹

b ln

d x

dx x b

Derivative of the Logarithmic Functions

1

 

^x ^ ^x ^ln

d b b b

5 dx

(31)

Exponential Functions

(32)

Exponential Functions (1 of 4)

Let’s try to compute the derivative of the exponential function ^{f x}

 

^ ^b^x

definition of a derivative:

     

 

0 0

lim lim

lim lim 1

x h x

h h

x h

x h x

h h

f x h f x b b

f x h h

b b b b b

h h



 

  

  

 

 

The factor b^x doesn’t depend on h, so we can take it in front of the limit:

 

0

lim 1

h x

h

f x b b

 h

  

Notice that the limit is the value of the derivative of f at 0, that is,

 

0

lim 1 0

h h

b f

 h

   Therefore we have shown that if the exponential function ^{f x}

 

^ ^b^x

is differentiable at 0, then it is differentiable everywhere and ⁴ ^{f x}^

 

^{ }^f

 

⁰ ^b^x

This equation says that the rate of change of any exponential function is

using the

(33)

Exponential Functions (2 of 4)

Numerical evidence for the existence of _f₍₀₎ is given in the table shown below for the cases b = 2 and b = 3. (Values are stated correct to four decimal

places.) It appears that the limits exist and for b = 2,

 

0

2 1

0 lim 0.693

h

f h

 h

   

for b = 3,

 

0

3 1

0 lim 1.099

h

f h

 h

   

   

⁰ ^x

f x  f b 4

Thus, from Equation 4, we have

 

²^x

^

^{0.693 2}

^

^x

 

³^x

^

^{1.099 3}

^

^x

d d

 

5

(34)

In view of the estimates of f(0) for b = 2 and b = 3, it seems reasonable that there is a number b between 2 and 3 for which _f₍₀₎  _1.

It is traditional to denote this value by the letter e. Thus we have the following definition.

Definition of the Number e e is the number such that

0

lim 1 1

h h

e

 h

 

   

⁰ ^x

f x  f b

4 ⁵ _dx^d

 

²^x ^

^

^{0.693 2}

^

^x _dx^d

 

³^x ^

^

^{1.099 3}

^

^x

Geometrically, this means that of all the possible exponential

functions y  b^x, the function is the one whose

 

^x

f x  e

tangent line at (0, 1) has a slope

(0) f 

that is exactly 1. (See Figures 6 and 7.)

(35)

Exponential Functions (4 of 4)

If we put b = e and, therefore, ^f^⁽⁰⁾ ^ ¹ in Equation 4, it becomes the following important differentiation formula.

Derivative of the Natural Exponential Function

 

^x ^x

d e e

dx 

Thus the exponential function^{f x}

 

^ ^e^x has the property that it is its own derivative. The geometrical significance of this fact is that the

   

⁰ ^x

f x  f b 4

Figure 7

(36)

Linear Approximations and Differentials

(37)

Linearization and Approximation

It might be easy to calculate a value f(a) of a function, but difficult (or even impossible) to compute nearby values of f.

So we settle for the easily computed values of the linear function L whose graph is the tangent line of f at (a, f(a)).

Figure 1

In other words, we use the tangent line at (a, f(a)) as an approximation to the curve y = f(x) when x is near a. An equation of this tangent line is ^y ^ ^{f a}

 

^ ^{f a}^

 

^x ^ ^a



The linear function whose graph is this tangent line, that is,

      

L x  f a  f a x a 1

(38)

Example 1

Find the linearization of the function ^{f x}

 

^ ^x ^ ³ ^{at a} = 1 and use it to approximate the numbers 3.98 and 4.05. Are these approximations overestimates or underestimates?

Solution:

The derivative of ^{f x}

  

^ ^x ^ ³



^1/2 ^is ^

 

^ ¹



^ ³



^^1/2

f x 2 x ¹

2 x 3

 

and so we have f(1) = 2 and ^f ^

 

¹ ^ ¹₄^.

Putting these values into Equation 1, we see

that the linearization is

      

 

1 4

1 1 1

2 1

7

4 4

L x f f x

x x

   

  

 

The corresponding linear approximation (2) is ^{  }7 (when is n

3 x ear 1)

x x

      

L x  f a  f a x  a 1

(39)

Example 1 – Solution

The linear approximation is illustrated in Figure 2. ^{Figure 2}

We see that, indeed, the tangent line approximation is a good approximation to the given function when x is near 1.

We also see that our approximations are overestimates because the tangent

In particular, we have ⁷ ^0.98

4 4

3.98

1.995  



and

7 1.05

4 4

4.05

2.0125

 



  7 (when is n

3 4 4x ear 1)

x x

(40)

Linearization and Approximation (2 of 5)

In the following table we compare the estimates from the linear approximation in Example 1 with the true values.

Figure 2

Notice from this table, and also from Figure 2, that the tangent line

approximation gives good estimates

when x is close to 1 but the accuracy of the approximation deteriorates when x is farther away from 1.

The next example shows that by using a graphing calculator or computer we can determine an interval throughout which a linear approximation provides a specified accuracy.

(41)

Differentials

The geometric meaning of differentials is shown in Figure 5.

   

y f x x f x

    

 

dy  f  x dx 3

Let dx = Δx

(42)

5.1.1 Taylor Series

(43)

Taylor Series

The Taylor series is a representation of a function f as an infinite sum of terms.

These terms are determined using derivatives of f evaluated at .

Definition 5.3 (Taylor Polynomial). The Taylor polynomial of degree n of at is defined as

where is the kth derivative of f at (which we assume exists) and are the coefficients of the polynomial.

Definition 5.4 (Taylor Series). For a smooth function , , the Taylor series of f at x0 is defined as

(44)

Taylor Series

Taylor Polynomial Taylor Series

Remark. In general, a Taylor polynomial of degree n is an approximation

of a function, which does not need to be a polynomial. The Taylor polynomial is similar to f in a neighborhood around . However, a Taylor polynomial of degree n is an exact representation of a polynomial f of degree k<=n since all derivatives , i > k vanish.

Example 5.3

(45)

When Is a Function Represented by Its Taylor Series?

Example 5.4 (Taylor Series)

Consider the function in Figure 5.4 given by f(x) = sin(x) + cos(x)

Figure 5.4 Taylor polynomials.

The original function f(x) = sin(x) + cos(x) (black, solid) is approximated by Taylor

(46)

When Is a Function Represented by Its Taylor Series?

The graphs of the exponential function and these three Taylor polynomials are drawn in Figure 1.

Figure 1

As nincreases, T_n(x) appears to approach ex in Figure 1. This suggests that e^x is equal to the sum of its Taylor series.

 

^x^,

f x  e

 

1

2 2

2 3

3

1

1 2!

1 2! 3!

T x x

T x x x

x x

T x x

 

  

   

(47)

When Is a Function Represented by Its Taylor Series?

8 Theorem If f(x) = T_n(x) + R_n(x), where T_n is the nth-degree Taylor polynomial of f at a, and if

 

lim _n 0

n R x

 

for ^x ^{ }^a ^R^, then f is equal to the sum of its Taylor series on the interval x  a R. In trying to show that lim _n( ) 0

n R x

  for a specific function f, we usually use the following Theorem.

9 Taylor's Inequality If ^f ^ⁿ^¹^

 

^x ^ ^M^for ^x ^{ }^a ^d^, then the remainder R_n(x) of

一個很重要的問題是：Taylor series 會不會收斂到原始函數完整定理陳述請參考微積分教科書

(48)

Newton’s Method

(49)

Newton’s Method

The geometry behind Newton’s method is shown in

Figure 2, where the solution that we are trying to find is labeled r in the figure.

Figure 2

We start with a first approximation x₁, which is obtained by guessing, or from a rough sketch of the graph of f, or from a computer-generated graph of f.

The idea behind Newton’s method is that the tangent line is close to the curve and so its x-intercept, x₂, is close to the x-intercept of the curve (namely, the root r that we are seeking). Because the tangent is a line, we can easily find its x-intercept.

Consider the tangent line L to the curve y = f(x) at the

point (x₁, f(x₁)) and look at the x-intercept of L, labeled x₂.

(50)

Newton’s Method

Since the x-intercept of L is x₂, we know that the point (x₂, 0) is on the line, and so

1 1 2 1

0  f x( )  f x( )(x  x )

If f x( )₁  0, we can solve this equation for x₂:

   

  ¹

2 1

1

x x f x

f′ x

We use x₂ as a second approximation to r.

Next we repeat this procedure with x₁ replaced by the second approximation x₂, using the tangent line at (x₂, f(x₂)).

This gives a third approximation:

2

3 2

( ) ( ) x x f x

f x

 



(51)

Newton’s Method

If we keep repeating this process, we obtain a sequence of approximations x₁, x₂, x₃, x₄, . . . as shown in Figure 3.

Figure 3

In general, if the nth approximation is x_n and (f x _n)  0, then the next approximation is given by

1

( ) ( )

n

n n

n

x x f x

   f x 2 

If the numbers x_n become closer and closer to r as n becomes large, then we

(52)

Example 1

Starting with x₁ = 2, find the third approximation x₃ to the solution of the equation

  

3 2 5 0.

x x

Solution:

We apply Newton’s method with. ^{f x}^{( )} ^ ^x³ ^ ²^x ^^{5 and} ^{f x}^^{( )} ^ ³^x² ^ ²

Newton himself used this equation to illustrate his method and he chose x₁ = 2 after some experimentation because f^\(1) = −6, f(2) = −1, and f(3) = 16.

1

( ) ( )

n

n n

n

x x f x

   f x 2 

Equation 2 becomes

 

3

1 2

2 5

3 2

n n n

n n

f x x x

x x x

f x x



 

   

 

With n = 1 we have

 

3

1 1 1

2 1 1 2

1 1

3

2

( ) 2 5

( ) 3 2

2 2 2 5

2 3 2 2

f x x x

x x x

f x x

 

   

 

 

  

(53)

Example 1 – Solution

Then with n = 2 we obtain

   

 

 

 



 

 



3

2 2

3 2 2

2 3

2

2 5

3 2

2.1 2 2.1 5 2.1

3 2.1 2

x x

x

 

3

1 1 1

2 1 1 2

1 1

3

2

( ) 2 5

( ) 3 2

2 2 2 5

2

3 2 2

f x x x

x x x

f x x

 

   

 

 

  

With n = 1 we have

1

( ) ( )

n

n n

n

x x f x

   f x

2  _{f x}_{( )} _ _x³ _ ₂_x _ _{5 and} _{f x}__{( )} _ ₃_x² _ ₂ x₁ = 2

(54)

What Derivatives Tell Us about the

Shape of a Graph

(55)

What Does f  Say About f ?

To see how the derivative of f can tell us where a

function is increasing or decreasing, look at Figure 1.

Figure 1

Between A and B and between C and D, the tangent lines have positive slope and so

^f ^   ^x ^ ^0.

Between B and C the tangent lines have negative slope and so ^f ^

 

^x ^ ^0.

Thus it appears that f increases when ^f ^

 

^x is positive and decreases when

 

f  x is negative.

To prove that this is always the case, we use the Mean Value Theorem.

Increasing/Decreasing Test ² ^{f b}

   

^ ^{f a} ^{ }^{f c}

 

^b ^^a



(56)

Increasing/Decreasing Test

(a) If ^f ^

 

^x ^ ⁰on an interval, then f is increasing on that interval.

(b) If ^f ^

 

^x ^ ⁰on an interval, then f is decreasing on that interval.

The Mean Value Theorem Let f be a function that satisfies the following hypotheses:

1. f is continuous on the closed interval [a, b].

2. f is differentiable on the open interval (a, b).

Then there is a number c in (a, b) such that

 

^{f b}

   

^{f a}

f c b a

  

1 

or, equivalently,

² ^{f b}

^{   }

^ ^{f a} ^{ }^{f c}

^{ }

^b ^^a

^

(57)

Example 1

Find where the function ^{f x}

 

^ ³^x⁴ ^ ⁴^x³ ^¹²^x² ^ ⁵ is increasing and where it is decreasing.

Solution:

We start by differentiating f: ^f ^

 

^x ^ ¹²^x³ ^¹²^x² ^ ²⁴^x ^ ¹²^{x x}



^ ²



^x ^¹



To use the I/D Test we have to know where f 

 

x  0 and where f 

 

x  0.

To solve these inequalities we first find where ^f ^

 

^x ^ ^0, namely at x = 0, 2, and −1.

(58)

Example 1 – Solution

These are the critical numbers of f, and they divide the domain into four intervals (see the number line in Figure 2).

Figure 2

Within each interval, ^{f x}^

 

must be always positive or always negative.

We can determine which is the case for each interval from the signs of the three factors of ^{f x}^

 

^, namely, 12x, x − 2, and x + 1, as shown in the chart.

 

¹² ³ ¹² ² ²⁴ ¹²



²



¹



f  x  x  x  x  x x  x 

(59)

Example 1 – Solution (2 of 3)

A plus sign indicates that the given expression is positive, and a minus sign indicates that it is negative. The last column of the chart gives the conclusion based on the I/D Test.

For instance, ^f ^

 

^x ^ ⁰ for 0 < x < 2, so f is decreasing on (0, 2). (It would also be true to say that f is decreasing on the closed interval [0, 2].)

(60)

Example 1 – Solution (3 of 3)

The graph of f shown in Figure 3 confirms the information in the chart.