• Tidak ada hasil yang ditemukan

Single and Multivariable Calculus

N/A
N/A
Protected

Academic year: 2023

Membagikan "Single and Multivariable Calculus"

Copied!
60
0
0

Teks penuh

(1)

Single and Multivariable Calculus

(2)

Text Book (Calculus)

Mathematics for Machine Learning

• https://mml-book.github.io/

• https://github.com/vbartle/MML-Companion

Table of Contents:

5 Vector Calculus 139

5.1 Differentiation of Univariate Functions 141 5.2 Partial Differentiation and Gradients 146 5.3 Gradients of Vector-Valued Functions 149 5.4 Gradients of Matrices 155

5.5 Useful Identities for Computing Gradients 158

5.6 Backpropagation and Automatic Differentiation 159 5.7 Higher-Order Derivatives 164

5.8 Linearization and Multivariate Taylor Series 165 5.9 Further Reading

(3)

Table of Contents

Part I: Mathematical Foundations 1.Introduction and Motivation 2.Linear Algebra

3.Analytic Geometry

4.Matrix Decompositions 5.Vector Calculus

6.Probability and Distribution 7.Continuous Optimization

Part II: Central Machine Learning Problems 8.When Models Meet Data

9.Linear Regression

(4)

Many algorithms in machine learning optimize an objective function with respect to a set of desired model parameters that control how well a model explains the data:

Finding good parameters can be phrased as an optimization problem. Examples include: (i) linear regression; (ii) neural-network auto-encoders for dimensionality reduction and data compression;

and (iii) Gaussian mixture models for modeling data distributions

(a) Regression problem: Find parameters, such that the curve explains the observations (crosses) well.

(b) Density estimation with a Gaussian mixture model: Find means and covariances, such that

(5)

• Function

• 5.1 Differentiation of Univariate Functions

• 5.1.2 Differentiation Rules

• Linear Approximations and Differentials

• 5.1.1 Taylor Series

• Newton’s Method

• What Derivatives Tell Us about the Shape of a Graph

Outline

(6)

A function f is a quantity that relates two quantities to each other. In this book, these quantities are typically inputs and targets (function values) f(x), which we assume are real-valued if not stated otherwise. Here is the

domain of f, and the function values f(x) are the image/codomain of f.

Function

to specify a function, where (5.1a) specifies that f is a mapping from to and (5.1b) specifies the explicit assignment of an input x to a function value f(x). A function f assigns every input x exactly one function value f(x).

We often write

(7)

Example 5.1

Recall the dot product as a special case of an inner product. In the previous notation, the function , would be specified as

In this chapter, we will discuss how to compute gradients of functions, which is often essential to facilitate learning in machine learning models

since the gradient points in the direction of steepest ascent. Therefore, vector

(8)

5.1 Differentiation of Univariate Functions

(9)

Differentiation

Definition 5.1 (Difference Quotient). The difference quotient

computes the slope of the secant line through two points on the graph of f. In Figure 5.3, these are the points with x-

coordinates and .

In the limit for , we obtain the tangent of f at x, if f is differentiable. The tangent is then the derivative of f at x.

Definition 5.2 (Derivative). More formally, for h > 0 the derivative of f derivative

(10)

The Tangent Problem (1 of 3)

The word tangent is derived from the Latin word tangens, which means “touching.”

For a circle we could simply follow Euclid and say that a

tangent is a line that intersects the circle once and only once,

as in Figure 1(a). Figure 1(a)

For more complicated curves this definition is inadequate.

Figure 1(b)

We can think of a tangent to a curve as a line that touches the curve and follows the same direction as the curve at the point of contact. How can this idea be made precise?

Figure 1(b) shows a line that appears to be a tangent to the curve C at point P, but it intersects C twice.

(11)

Example 1

Find an equation of the tangent line to the parabola yx2 at the point P(1, 1).

Solution:

We will be able to find an equation of the tangent line as soon as we know its slope m.

The difficulty is that we know only one point, P, on , whereas we need two points to compute the slope.

But observe that we can compute an approximation to m by choosing a nearby point Q x x

, 2

(as in Figure 2) and computing the slope mPQ of the secant line PQ. (A secant line, from the Latin word

on the parabola

(12)

Example 1 – Solution (1 of 2)

We choose x ≠ 1 so that QP. Then

2 1

PQ 1 m x

x

For instance, for the point Q(1.5, 2.25) we have 2.25 1

1.5 1 1.25

0.5 2.5

mPQ

 

x mPQ

2 3

1.5 2.5

1.1 2.1

1.01 2.01

1.001 2.001

x mPQ

0 1

0.5 1.5

0.9 1.9

0.99 1.99

0.999 1.999

The tables in the margin show the values of mPQ for several values of x close to 1.

The closer Q is to P, the closer x is to 1 and, it appears from the tables, the closer m is to 2.

(13)

Example 1 – Solution (2 of 2)

This suggests that the slope of the tangent line should be m = 2.

We say that the slope of the tangent line is the limit of the slopes of the secant lines, and we express this symbolically by writing

2 1

lim and lim 1 2

PQ 1

Q P x

m m x

x

  

Assuming that the slope of the tangent line is indeed 2, we use the point-slope form of the equation of a line [yy1 = m(xx1)] to write the equation of the tangent line through (1, 1) as

 

1 2 1 or 2 1

y   xyx

(14)

Example 1

Find an equation of the tangent line to the parabola yx2 at the point P(1, 1).

Solution:

Figure 2

2 1

PQ 1 m x

x

2 1

lim and lim 1 2

PQ 1

Q P x

m m x

x

  

(15)

The Tangent Problem (2 of 3)

Figure 3 illustrates the limiting process that occurs in Example 1.

Figure 3

Q approaches P from the right

(16)

The Tangent Problem (3 of 3)

Figure 3

Q approaches P from the left

As Q approaches P along the parabola, the corresponding secant lines rotate about P and approach the tangent line .

(17)

5.1.2 Differentiation Rules

(18)

Constant Functions

Let’s start with the simplest of all functions, the constant function f(x) = c.

The graph of this function is the horizontal line y = c, which has slope 0, so we must have

( ) 0.

f x (See Figure 1.)

Figure 1

The graph of f(x) = cis the line y= c, so f x( )0.

A formal proof, from the definition of a derivative, is also easy:

     

0 0 0

lim lim lim 0 0

h h h

f x h f x c c

f x h h

In Leibniz notation, we write this rule as follows.

Derivative of a Constant Function

 

0

d c

(19)

Power Functions (1 of 3)

We next look at the functions f x( )  xn, where n is a positive integer.

If n = 1, the graph of f(x) = x is the line y = x, which has slope 1. (See Figure 2.)

Figure 2

The graph of f(x) = xis the line y= x, so f x( )1.

So d

 

x 1

dx 1

(You can also verify Equation 1 from the definition of a derivative.) We have already investigated the cases n = 2 and n = 3. We found that

(20)
(21)

Power Functions (2 of 3)

For n = 4 we find the derivative of f x

 

x4 as follows:

     

 

 

0

4 4

0

4 3 2 2 3 4 4

0

3 2 2 3 4

0

3 2 2 3

0

lim

lim

4 6 4

lim

4 6 4

lim

lim 4 6 4

h

h

h

h

h

f x h f x

f x h

x h x

h

x x h x h xh h x

h

x h x h xh h h

x x h xh h

(22)

Power Functions (3 of 3)

3 dxd

 

x4 4x3

Comparing the equations in (1), (2), and (3), we see a pattern emerging.

It seems to be a reasonable guess that, when n is a positive integer,

d dx/

  

xn nxn1.

This turns out to be true.

The Power Rule If n is a positive integer, then dxd

 

xn nxn1

The Power Rule (General Version) If n is any real number, then

 

n n 1

d x nx dx

The Power Rule enables us to find tangent lines without having to resort to the definition of a derivative. It also enables us to find normal lines.

The normal line to a curve C at a point P is the line through P that is perpendicular

(23)

Example 1

(a) If f x

 

x6,then f x

 

6x5.

(b) If yx1000, then y=1000x999. (c) If y t4, then dy 4 .t3

dt

(d) drd

 

r3 3r 2

(24)

New Derivatives from Old (1 of 2)

When new functions are formed from old functions by addition, subtraction, or multiplication by a constant, their derivatives can be calculated in terms of

derivatives of the old functions.

In particular, the following formula says that the derivative of a constant times a function is the constant times the derivative of the function.

The Constant Multiple Rule If c is a constant and f is a differentiable function, then d cf x

 

c d f x

 

dx    dx lim

 

lim

 

x a cf x c x af x

 

(25)

Example 4

(a)

   

 

4 4

3 3

3 3

3 4 12

d d

x x

dx dx

x x

(b)

   

   

 

1 1

1 1

d d

x x

dx dx

d x dx

 

    

 

 

(26)

New Derivatives from Old (2 of 2)

The next rule tells us that the derivative of a sum (or difference) of functions is the sum (or difference) of the derivatives.

The Sum and Difference Rules If f and g are both differentiable, then

       

d d d

f x g x f x g x

dx dx dx

       

d d d

f x g x f x g x

dx dx dx

The Sum Rule can be extended to the sum of any number of functions. For instance, using this theorem twice, we get

f  g h

  

f g

h     (f g) h   f g  h

The Constant Multiple Rule, the Sum Rule, and the Difference Rule can be combined with the Power Rule to differentiate any polynomial.

       

lim lim lim

x a f x g x x af x x ag x

       

lim lim lim

x a f x g x x af x x ag x

(27)

Differentiation Rules

The Sum Rules d f x

 

g x

 

d f x

 

d g x

 

dx dx dx

Derivative of a Constant Function d

 

c 0

dx

The Power Rule (General Version) If n is any real number, then dxd

 

xn nxn1

   

d d

cf x c f x dx   dx

The Constant Multiple Rule

If c is a constant and f is a differentiable function, then

The Product Rule d f x g x

   

f x

 

d g x

 

g x

 

d f x

 

dx dx dx

If f and g are both differentiable, then

   

       

 

2

d d

g x f x f x g x

d f x dx dx

dx g x

The Quotient Rule

(28)
(29)

Example 5.5

(30)

Derivative of the Exponential Function

dxd

 

ex ex

ln

1

d x

dx x

2

log

1

b ln

d x

dx x b

Derivative of the Logarithmic Functions

1

 

x x ln

d b b b

5 dx

(31)

Exponential Functions

(32)

Exponential Functions (1 of 4)

Let’s try to compute the derivative of the exponential function f x

 

bx

definition of a derivative:

     

 

0 0

0 0

lim lim

lim lim 1

x h x

h h

x h

x h x

h h

f x h f x b b

f x h h

b b b b b

h h

  

  

 

 

The factor bx doesn’t depend on h, so we can take it in front of the limit:

 

0

lim 1

h x

h

f x b b

h

  

Notice that the limit is the value of the derivative of f at 0, that is,

 

0

lim 1 0

h h

b f

h

   Therefore we have shown that if the exponential function f x

 

bx

is differentiable at 0, then it is differentiable everywhere and 4 f x

 

 f

 

0 bx

This equation says that the rate of change of any exponential function is

using the

(33)

Exponential Functions (2 of 4)

Numerical evidence for the existence of f(0) is given in the table shown below for the cases b = 2 and b = 3. (Values are stated correct to four decimal

places.) It appears that the limits exist and for b = 2,

 

0

2 1

0 lim 0.693

h

f h

h

for b = 3,

 

0

3 1

0 lim 1.099

h

f h

h

   

0 x

f x  f b 4

Thus, from Equation 4, we have

 

2x

0.693 2

x

 

3x

1.099 3

x

d d

5

(34)

In view of the estimates of f(0) for b = 2 and b = 3, it seems reasonable that there is a number b between 2 and 3 for which f(0) 1.

It is traditional to denote this value by the letter e. Thus we have the following definition.

Definition of the Number e e is the number such that

0

lim 1 1

h h

e

h

 

   

0 x

f x  f b

4 5 dxd

 

2x

0.693 2

x dxd

 

3x

1.099 3

x

Geometrically, this means that of all the possible exponential

functions ybx, the function is the one whose

 

x

f xe

tangent line at (0, 1) has a slope

(0) f

that is exactly 1. (See Figures 6 and 7.)

(35)

Exponential Functions (4 of 4)

If we put b = e and, therefore, f(0) 1 in Equation 4, it becomes the following important differentiation formula.

Derivative of the Natural Exponential Function

 

x x

d e e

dx

Thus the exponential functionf x

 

ex has the property that it is its own derivative. The geometrical significance of this fact is that the

   

0 x

f x  f b 4

Figure 7

(36)

Linear Approximations and Differentials

(37)

Linearization and Approximation

It might be easy to calculate a value f(a) of a function, but difficult (or even impossible) to compute nearby values of f.

So we settle for the easily computed values of the linear function L whose graph is the tangent line of f at (a, f(a)).

Figure 1

In other words, we use the tangent line at (a, f(a)) as an approximation to the curve y = f(x) when x is near a. An equation of this tangent line is y f a

 

f a

 

x a

The linear function whose graph is this tangent line, that is,

      

L xf af axa 1

(38)

Example 1

Find the linearization of the function f x

 

x 3 at a = 1 and use it to approximate the numbers 3.98 and 4.05. Are these approximations overestimates or underestimates?

Solution:

The derivative of f x

  

x 3

1/2 is

 

1

3

1/2

f x 2 x 1

2 x 3

and so we have f(1) = 2 and f

 

1 14.

Putting these values into Equation 1, we see

that the linearization is

      

 

1 4

1 1 1

2 1

7

4 4

L x f f x

x x

   

  

 

The corresponding linear approximation (2) is   7 (when is n

3 x ear 1)

x x

      

L xf af axa 1

(39)

Example 1 – Solution

The linear approximation is illustrated in Figure 2. Figure 2

We see that, indeed, the tangent line approximation is a good approximation to the given function when x is near 1.

We also see that our approximations are overestimates because the tangent

In particular, we have 7 0.98

4 4

3.98

1.995

 

and

7 1.05

4 4

4.05

2.0125

 

  7 (when is n

3 4 4x ear 1)

x x

(40)

Linearization and Approximation (2 of 5)

In the following table we compare the estimates from the linear approximation in Example 1 with the true values.

Figure 2

Notice from this table, and also from Figure 2, that the tangent line

approximation gives good estimates

when x is close to 1 but the accuracy of the approximation deteriorates when x is farther away from 1.

The next example shows that by using a graphing calculator or computer we can determine an interval throughout which a linear approximation provides a specified accuracy.

(41)

Differentials

The geometric meaning of differentials is shown in Figure 5.

   

y f x x f x

    

 

dy f x dx 3

Let dx = Δx

(42)

5.1.1 Taylor Series

(43)

Taylor Series

The Taylor series is a representation of a function f as an infinite sum of terms.

These terms are determined using derivatives of f evaluated at .

Definition 5.3 (Taylor Polynomial). The Taylor polynomial of degree n of at is defined as

where is the kth derivative of f at (which we assume exists) and are the coefficients of the polynomial.

Definition 5.4 (Taylor Series). For a smooth function , , the Taylor series of f at x0 is defined as

(44)

Taylor Series

Taylor Polynomial Taylor Series

Remark. In general, a Taylor polynomial of degree n is an approximation

of a function, which does not need to be a polynomial. The Taylor polynomial is similar to f in a neighborhood around . However, a Taylor polynomial of degree n is an exact representation of a polynomial f of degree k<=n since all derivatives , i > k vanish.

Example 5.3

(45)

When Is a Function Represented by Its Taylor Series?

Example 5.4 (Taylor Series)

Consider the function in Figure 5.4 given by f(x) = sin(x) + cos(x)

Figure 5.4 Taylor polynomials.

The original function f(x) = sin(x) + cos(x) (black, solid) is approximated by Taylor

(46)

When Is a Function Represented by Its Taylor Series?

The graphs of the exponential function and these three Taylor polynomials are drawn in Figure 1.

Figure 1

As nincreases, Tn (x) appears to approach ex in Figure 1. This suggests that ex is equal to the sum of its Taylor series.

 

x,

f xe

 

 

 

1

2 2

2 3

3

1

1 2!

1 2! 3!

T x x

T x x x

x x

T x x

 

  

  

(47)

When Is a Function Represented by Its Taylor Series?

8 Theorem If f(x) = Tn(x) + Rn(x), where Tn is the nth-degree Taylor polynomial of f at a, and if

 

lim n 0

n R x



for x  a R, then f is equal to the sum of its Taylor series on the interval x  a R. In trying to show that lim n( ) 0

n R x

 for a specific function f, we usually use the following Theorem.

9 Taylor's Inequality If f n1

 

x M for x  a d, then the remainder Rn(x) of

一個很重要的問題是:Taylor series 會不會收斂到原始函數 完整定理陳述請參考微積分教科書

(48)

Newton’s Method

(49)

Newton’s Method

The geometry behind Newton’s method is shown in

Figure 2, where the solution that we are trying to find is labeled r in the figure.

Figure 2

We start with a first approximation x1, which is obtained by guessing, or from a rough sketch of the graph of f, or from a computer-generated graph of f.

The idea behind Newton’s method is that the tangent line is close to the curve and so its x-intercept, x2, is close to the x-intercept of the curve (namely, the root r that we are seeking). Because the tangent is a line, we can easily find its x-intercept.

Consider the tangent line L to the curve y = f(x) at the

point (x1, f(x1)) and look at the x-intercept of L, labeled x2.

(50)

Newton’s Method

Since the x-intercept of L is x2, we know that the point (x2, 0) is on the line, and so

1 1 2 1

0 f x( ) f x( )(x x )

If f x( )1  0, we can solve this equation for x2:

   

  1

2 1

1

x x f x

fx

We use x2 as a second approximation to r.

Next we repeat this procedure with x1 replaced by the second approximation x2, using the tangent line at (x2, f(x2)).

This gives a third approximation:

2

3 2

( ) ( ) x x f x

f x

 

(51)

Newton’s Method

If we keep repeating this process, we obtain a sequence of approximations x1, x2, x3, x4, . . . as shown in Figure 3.

Figure 3

In general, if the nth approximation is xn and (f xn)  0, then the next approximation is given by

1

( ) ( )

n

n n

n

x x f x

  f x 2

If the numbers xn become closer and closer to r as n becomes large, then we

(52)

Example 1

Starting with x1 = 2, find the third approximation x3 to the solution of the equation

  

3 2 5 0.

x x

Solution:

We apply Newton’s method with. f x( ) x3 2x 5 and f x( ) 3x2 2

Newton himself used this equation to illustrate his method and he chose x1 = 2 after some experimentation because f\(1) = −6, f(2) = −1, and f(3) = 16.

1

( ) ( )

n

n n

n

x x f x

  f x 2

Equation 2 becomes

 

 

3

1 2

2 5

3 2

n n n

n n n

n n

f x x x

x x x

f x x

With n = 1 we have

 

 

3

1 1 1

2 1 1 2

1 1

3

2

( ) 2 5

( ) 3 2

2 2 2 5

2 3 2 2

f x x x

x x x

f x x

 

(53)

Example 1 – Solution

Then with n = 2 we obtain

   

 

3

2 2

3 2 2

2 3

2

2 5

3 2

2.1 2 2.1 5 2.1

3 2.1 2

x x

x x

x

 

 

3

1 1 1

2 1 1 2

1 1

3

2

( ) 2 5

( ) 3 2

2 2 2 5

2

3 2 2

f x x x

x x x

f x x

 

With n = 1 we have

1

( ) ( )

n

n n

n

x x f x

  f x

2f x( ) x3 2x 5 and f x( ) 3x2 2 x1 = 2

(54)

What Derivatives Tell Us about the

Shape of a Graph

(55)

What Does f  Say About f ?

To see how the derivative of f can tell us where a

function is increasing or decreasing, look at Figure 1.

Figure 1

Between A and B and between C and D, the tangent lines have positive slope and so

f   x 0.

Between B and C the tangent lines have negative slope and so f

 

x 0.

Thus it appears that f increases when f

 

x is positive and decreases when

 

fx is negative.

To prove that this is always the case, we use the Mean Value Theorem.

Increasing/Decreasing Test 2 f b

   

f a  f c

 

b a

(56)

Increasing/Decreasing Test

(a) If f

 

x 0on an interval, then f is increasing on that interval.

(b) If f

 

x 0on an interval, then f is decreasing on that interval.

The Mean Value Theorem Let f be a function that satisfies the following hypotheses:

1. f is continuous on the closed interval [a, b].

2. f is differentiable on the open interval (a, b).

Then there is a number c in (a, b) such that

 

f b

   

f a

f c b a

1

or, equivalently,

2 f b

   

f a  f c

 

b a

(57)

Example 1

Find where the function f x

 

3x4 4x3 12x2 5 is increasing and where it is decreasing.

Solution:

We start by differentiating f: f

 

x 12x3 12x2 24x 12x x

2



x 1

To use the I/D Test we have to know where f

 

x  0 and where f

 

x  0.

To solve these inequalities we first find where f

 

x 0, namely at x = 0, 2, and −1.

(58)

Example 1 – Solution

These are the critical numbers of f, and they divide the domain into four intervals (see the number line in Figure 2).

Figure 2

Within each interval, f x

 

must be always positive or always negative.

We can determine which is the case for each interval from the signs of the three factors of f x

 

, namely, 12x, x − 2, and x + 1, as shown in the chart.

 

12 3 12 2 24 12

2



1

fxxxxx xx

(59)

Example 1 – Solution (2 of 3)

A plus sign indicates that the given expression is positive, and a minus sign indicates that it is negative. The last column of the chart gives the conclusion based on the I/D Test.

For instance, f

 

x 0 for 0 < x < 2, so f is decreasing on (0, 2). (It would also be true to say that f is decreasing on the closed interval [0, 2].)

(60)

Example 1 – Solution (3 of 3)

The graph of f shown in Figure 3 confirms the information in the chart.

Gambar

Figure 3 illustrates the limiting process that occurs in Example 1.
The linear approximation is illustrated in Figure 2. Figure 2
Figure 5.4  Taylor polynomials.

Referensi

Garis besar

Dokumen terkait

4~ THE ANTE‑NICENE ground of mercy and grace, " heavenly grace, " and the like, to the child, while on the other hand he speaks of the refusal of baptism to infants as that which