Decimal Base β Nested form

(1)

CIS 541 Number Systems

Numbers

Decimal

The integer number 48356 can be represented as:

48356 = 6×10⁰+ 5×10¹+ 3×10²+ 8×10³+ 4×10⁴ in general an integer can be represented as:

a_na_n−1. . .a₁a₀ =a₀×10⁰+a₁×10¹+. . .+a_n−1×10ⁿ⁻¹+a_n×10ⁿ

and a fractional part as:

.b1b2b3. . . =b1×10⁻¹+b2×10⁻²+b3×10⁻³+. . . So a real number is:

anan−1. . .a1a0.b1b2b3. . . =

n

X

k=0

ak10^k +

∞

X

k=1

bk10^−k

Numbers

Base β

a_na_n−1. . .a₁a₀.b₁b₂b₃. . . =

n

X

k=0

a_kβ^k +

∞

X

k=1

b_kβ^−k

β = 2,8,10,16 are common bases.

Nested form

N = (a_na_n−1. . . a₀)_β =Pn

k=0a_kβ^k =

a₀+β(a₁+β(a₂+. . .+β(a_n−1+β(a_n)). . .)

(2)

Conversion between bases: Integer Part

Method1: Convert to nested form, then replace each number in the new base. Carry out arithmetic in the new base.

(3781)₁₀

= 1 + 10 ( 8 + 10 ( 7 + 10 (3)))

=(1)₂+(1010)₂((1000)₂+(1010)₂((0111)₂+(1010)₂((0011)₂)))

= (111011000101)₂

Method 2: Observe that when

N =a0+β(a1+β(a2+. . .+β(an−1 +β(an)). . .) is divided by β the remainder is a0 and the quotient is

N =a1+β(a2+. . .+β(an−1+β(an)). . .) So successive divides and collecting the remainders recontructs N. So we can use this for conversion between bases.

Conversion between bases: Integer Part Example

2|3781 . 2|1890 1

2|945 0 2|472 1 2|236 0 2|118 0 2|59 0 2|29 1 2|14 1 2|7 0 2|3 1 2|1 1 0 1

N = 1×2⁰+ 0×2¹+ 1×2²+ 0×2³+ 0×2⁴+ 0×2⁵+ 1×2⁶+ 1×2⁷+ 0×2⁸+ 1×2⁹+ 1×2¹⁰+ 1×2¹¹

= 1 + 2(0 + 2(1 + 2(0 + 2(0 + 2(0 + 2(1 + 2(1 + 2(0 + 2(1 + 2(1 + 2(1). . .)

= 3781

(3)

Conversion between bases: Fractionl Part

x =

∞

X

k=1

ckβ^−k = (0.c1c2c3. . .)β

note that

βx= (c1.c2c3...)β

thus the digit c₁ is the integer part of βx or I(βx) and the fractional part is F(βx)

d₀ =x

d₁ =F(βd₀) c₁ =I(βd₀) d₂ =F(βd₁) c₂ =I(βd₁) ...

while doing the arithmetic in decimal

Conversion between bases: Fractionl Part Example

. .8 . .304

× 2 × 2

1 .6 0 .608

× 2 × 2

1 .2 1 .216

× 2 × 2

0 .4 0 .432

× 2 × 2

0 .8 0 .864

× 2 × 2

1 .6 1 .728

× 2 × 2

1 .2 1 .456

× 2 × 2

0 .4 0 .912

× 2 × 2

0 .8 1 .824

... ...

(4)

Conversion between bases: Example

(2576.35546875)10 in octal would be:

8|2576 . . .35546875

8|322 0 × 8

8|40 2 2 .84375000

8|5 0 × 8

8|0 5 6 .75000000

× 8

6 .00000000

= (5020.266)8 = (101 000 010 000.010 110 110)2

So converting to octal first is faster for humans.

The best way to do conversions is 10 ↔8 ↔ 2↔ 16

When converting (N)α to base β nested is prefered when α < β and dividing when α > β.

Error

approximate value = true value + signed error.

or

signed error = true value - approximate value So the signed error, can be positive, negative or zero. Generally we are only interested in the absolute value of the error.

error=|signed error| So error is positive or zero.

When might we care about the sign of the error?

How about when trying to fit a part inside of another part? When the error is a direction and we want to make a course correction?

(5)

CIS 541 Roundoff errors

Relative Error

Relative error is the error relative to the answer.

|true value−approximate value| true value

So for a number x that has a machine representation of x˜ it’s relative error is

|x−x˜| x

Relative error is always the difference between an approximation and the real answer divided by the real answer.

Why would we care about relative error? An error of a foot in measuring the size of a desk matters, but an error of a foot in measuring the distance to the sun probably doesn’t.

What if the true value is zero? The relative error is undefined.

Condition and Stability

Condition

Say we have a problem with input (data) x and output (answer) y =F(x). The problem is said to be well-conditioned if ”small” changes in x lead to ”small” changes in y.

What if the changes iny are ”Large”? That’s right, it is Ill-conditioned.

Stability

Stability is concerned with the sensitivity of an algorithm for solving the problem. An algorithm is said to be stable if if ”small” changes in the input (x)lead to ”small” changes in the output (y).

And if the changes in the output are large? Then the algorithm is said to be unstable.

(6)

Roundoff Errors

Let’s look at a pair of machine numbers on a number line.

x₋ x₊

Consider the number x to be between x₋ and x₊ where x₋ is the first machine number less than x and x+ to be the first machine number greater than x. Rounding is choosing which machine number to represent x. In this case, a correctly rounded number x can only have either x− or x+ as it’s machine representation. The question becomes which one is the correct one?

Different methods choose different numbers as correct.

Rounding: Round to nearest

The first method of rounding is round to nearest, which is what most people consider rounding. In this class we will also consider rounding to be round to nearest (unless otherwise stated).

As an examples, consider 3 decimal digits of accuracy and a pair of numbers on a number line, say .752 and .753, which are .001 appart.

.752 .753

A number x, between .752 and .753 can be represented by one of these numbers (.752 or .753) as a closest approximation. In round to nearest or rounding we would choose the machine number closest to x as its approximation.

(7)

Rounding: Round to nearest

So if the error of an approximation is not greater than1/2×10⁻³ or .0005 it can be accepted as correct.

So an answer is correct if error ≤ 1

2 ×10^−k

where k is the number of digits. Consider .001 to be , so

roundof f error ≤ 1 2 in round to nearest (rounding).

Roundoff Errors

A number is represented by a machine with a machine number. The machine epsilon or unit roundoff error is the error when rounding to one. We can find this by determining the first number that can be represented that is greater than one. For instance if we represent numbers with 23 bits, the machine epsilon

= 2⁻²³. So for single precision single = 2⁻²³. So for double precision with a 52 bit mantissadouble = 2⁻⁵².

(8)

Roundoff Versus Chopping

x⁰ x x⁰⁰

The numberxlies between two machine numbersx and x⁰. We can choose to round x when storing which would be x⁰⁰ in this case, or we can chop (truncate) it to the number x⁰ in this case. The method used affects the error in representation. For rounding the error is:

error ≤ 1 2 For chopping the error is:

error ≤ So which is better?

Precision

X =.256834×10⁵

The digit 2 is the most significant digit while 4 is the least significant digit.

Accuracy is how close to a target you get, say 2”

to a bullseye. And it is measured with one value.

Precision is how good your estimate of the accuracy is, say you are accurate to 2”±.2”. It also refers to how close a group of estimates are. So if you hit the bullseye it is an accurate shot. If you keep hitting it over and over it is precise and accurate. If you keep missing it but missing it in the same manner it is precise but inaccurate.

In a computer the accuracy is how close to the correct answer your machine representation is. The precision is how many digits of accuracy your machine representation has (number of bits in mantissa).

When accurate and precise, it is exact.

(9)

CIS 541 Precision

Loss of Significance.

x =.3721448693 y =.3720214371

What is the relative error of x −y in a computer with 5 decimal digits of accuracy?

˜

x =.37214

˜

y =.37202

x−˜ y =.00012 = .12000×10⁻³ x−y =.0001234322

The relative error is:

|(x−y)−(˜x−y)˜ |

|x−y| = .0000034322

.0001234322 ≈3×10⁻² but the relative error of x˜ and y˜≈1.3×10⁻⁵

When x˜−y˜ is stored as .12000×10⁻³ there are 3 spurious zeros added which are a loss of significance.

CIS 541 Precision

Loss of precision Theorem

Let x and y be normalized floating-point numbers with x > y > 0. If 2^−p ≤ 1 − _x^y ≤ 2^−q for some positive integers p and q, then at most p and at least q significant binary bits are lost in the subtraction.

Basically the closer the two numbers the greater the loss of significance.

So how could we avoid loss of significance?

(10)

Avoiding loss of significance

Use double precision. Doesn’t always work.

Modify the calculations to remove subtractions of numbers close together (their difference is small).

f(x) =p

x²+ 1−1 as x approaches 0, √

x² + 1 approaches 1. So reorder to remove the subtraction.

f(x) = (p

x²+ 1−1)(

√x²+ 1 + 1

√x²+ 1 + 1) = x²

√x²+ 1 + 1 f(x) =x−sin(x)

approximate sin(x) with its Taylor series.

sin(x) =x−x³ 3! +x⁵

5! −x⁷ 7! +. . . f(x) =x−sin(x) =x−(x−x³

3! +x⁵ 5! −x⁷

7! +. . .)

= x³ 3! −x⁵

5! + x⁷

7! −. . .)

Fixed Point

How could we store the integer 345 in a decimal system with 8 digits?

How about 00000345?

How could we store the real number 345.643?

We’d have to know where the decimal point was.

Let’s say we chose 4 digits after the decimal point.

0345.6430

This is fixed point, since the decimal point does not move but remains fixed.

(11)

CIS 541 Precision

Floating Point

Instead what if we used floating point. We can then have a fixed number of digits to represent the number (we call this the mantissa) and we use the remaining digits for the exponent. So now with our same example lets store 345.643 with 6 digits for the mantissa and 2 digits for the exponent or.

mmmmmmee

So let’s first normalize the number, that is put the first non-zero term after the decimal point or

345.643 = .345643×10³ So we can store it as:

34564303

Why do we normalize? Because a normalized number uses the most precision available. (don’t waste leading zeros. Also, there is a unique representation.

CIS 541 Precision

Fixed vs. Floating Point

Fixed Point Floating Point

faster larger range of

numbers for same number of digits simpler (ie cheaper) Distance between

machine nos. not always the same great when only

integers or numbers all same magnitude

(12)

Example Floating Point Systems

Floating point system 3 decimal digits

2 digits for the mantissa 1 digits for the exponent

Let’s look at the numbers that can be represented.

.00×10⁰ =.00000000000 .10×10⁻⁹ =.00000000010 .11×10⁻⁹ =.00000000011 ...

.98×10⁻⁹ =.00000000098 .99×10⁻⁹ =.00000000099 .10×10⁻⁸ =.00000000100 .11×10⁻⁸ =.00000000110 .12×10⁻⁸ =.00000000120

Notice how when the exponent changes, the distance between succesive numbers goes up by an order of magnitude (in the base). And notice how the distance between zero and the first number is large, compared to the first number and the number just larger than it.

This is the hole at zero.

Example Floating Point Systems

Floating point system 1 8 decimal digits

Floating point system 2 8 hexidecimal digits

What about x < 0? We need a negative sign for the mantissa!

What about x < 1? We need a negative sign for the exponent!

(13)

CIS 541 IEEE Floating Point

Example Floating Point Systems

Questions about the example systems.

Note: Assume a normalized floating point system, unless specified otherwise.

5m 3e dec 6m 2e Hex System 1 System 2

smallest number

−.99999×10999 −.F F F F F F×16F F (10)

smallest positive no.

.10000×10−999 .100000×16−F F (10)

largest no.

.99999×10999 .F F F F F F×16F F (10)

largest rel. round-off err

5 100005

8 1000008 largest rel. round-off err

for number stored as 13 or13₍₁₆₎

129955 8(16)

12f f f8(16)

IEEE single-precision floating point standard

contains ±0,±∞, normal and subnormal single- precision floating-point numbers but not NaNs (Not a Number) values.

±q×2^m

sign of q 1 bit integer |m| 8 bits

number q 23 bits

(−1)^s×2^c−127×(1.f)₂

s = 0 = + s = 1 = -

c is the exponent as an excess-127 code, so the exponent goes from -127 to 128.

f is the mantissa in 1plus form, Since the first bit is always 1 it doesn’t need to be stored. So:

0 < c <(11111111)₂ = 255

(14)

0 and 255 are special so the actual exponent is

−126 ≤c−127 ≤127 The Mantissa

1≤ (1.f)² ≤(1.11111111111111111111111)2 = 2−2⁻²³ The largest number is

(2−2⁻²³)2¹²⁷ ≈2¹²⁸ ≈3.4×10³⁸

The smallest positive number is 2⁻¹²⁶ ≈1.2×10⁻³⁸ For example the number -52.234375 is represented as:

(52)10 = (64.)8 = (110 100.)2

(.234375)10 = (.17)8 = (.001 111)2

(52.234375)10 = (110100.001111)2 = (1.10100001111)2×2⁵

c−127 = 5 so c= 132

(132)10 = (204)8 = (10 000 100)2

|1|100 0010 0|101 0000 1111 0000 0000 0000|

= (C250F000)₁₆

The machine epsilon is

_single = 2⁻²³ ≈ 1.19×10⁻⁷

which is 6 decimal digits of precision.

double = 2⁻⁵² ≈2.22×10⁻¹⁶

which is 15 digits of precision.

Integers use 31 bits (plus sign) for a range of

(−(2³¹−1),2³¹−1) = (−2147483647,2147483647)

which is about 9 digits of precision.

(15)

Exponent Numerical Representation (00000000)2 = 010 ±0 if mantissa = 0

subnormal otherwise (00000001)₂ = 1₁₀ 2⁻¹²⁶

(00000010)₂ = 2₁₀ 2⁻¹²⁵ ...

(11111110)₂ = (254)₁₀ 2¹²⁷

(11111111)₂ = (255)₁₀ ±∞ if b₁₀, b₁₁, . . . , b₃₂ = 1 NaN otherwise

Can a number be stored that is less than 1.0×2⁻¹²⁶?

yes! A subnormal number.

If the exponent is zero and the mantissa is nonzero the number can be a subnormal number or zero.

For instance on a machine that allows subnormal numbers

(00000001)₁₆ = (1.00000000000000000000001)₂×2⁻¹²⁷

≈1.4×10−⁴5

CIS 541 Arithmetics

IEEE floating point Other details

Round(x) is the machine representation of the number x after it has been rounded. To understand round considerx+as the first machine number> x and x− as the first machine number < x so x+ ≥x ≥x−. Rounding in IEEE fp can be done with 4 methods.

• Round to nearest: round(x) is either x₋ or x₊, whichever is nearer to x. If a tie, choose the one with the least significant bit equal to 0.

• Round towards zero: round(x) is either x− or x+, whichever is between 0 and x.

• Round towards −∞/round down: round(x) = x−.

• Round towards +∞/round up: round(x) = x+.

(16)

Floating-point arithmetic

Floating-point (sign)0.d1d2. . . dk ×β^e In normalized form d₁ 6= 0.

In decimal normalized floating-point the mantissa r is in the interval [₁₀¹,1).

x =±r×10ⁿ 1

10 ≤r < 1 In binary normalized floating-point

x =±q ×2^m 1

2 ≤q < 1

What kind of problems could we run into with floating point arithmetic? A general solution so special cases can fail. Limited range, limited precision, no error bound.

Variable precision floating-point arithmetic

Use as many digits in the mantissa as needed.

Of course this could be infinite so we specify a bound N on the number of digits.

This increases precision, but we don’t know how accurate our answer is.

(17)

CIS 541 Arithmetics

Interval Arithmetic

Represent number as its computer representation and its maximum error.

m± = [a, b]

m= a+b 2 = b−a

2

a m b

So what does this do for us? It allows us to know what the error of a computation is so we know if our answer is good enough for our purposes.

CIS 541 Arithmetics

Interval Arithmetic: Operations

m1±1+m2±2 = (m1+m2)±(1+2) m1±1−m2±2 = (m1−m2)±(1+2)

m1±1×m2±2 = (m1m2)±(1|m2|+2|m1|+12) m₁±₁÷m₂±₂ =







m₁ m₂ ±

₁+|^m_m¹

2|₂

|m₂|−₂

if |m₂|> ₂ division error if |m2| ≤2

(18)

Interval Arithmetic:Examples

0 2 10 20

I1 = [0,2] = 1±1 I2 = [10,20] = 15±5

I₁+I₂ = 1±1 + 15±5 =

= 1 + 15±1 + 5 = 16±6

I1−I2 = 1±1−15±5 =

= 1−15±1 + 5 = −14±6

I₁×I₂ = 1±1×15±5 =

= 1×15±1× |15|+ 5× |1|+ 1×5 = 15±25

I1÷I2 = 1±1÷15±5 =

= 1

15 ±1 +|₁₅¹| ×5

|15| −5

= 1 15±

15 15+₁₅⁵

10

= 1 15 ±

20 15

10

= 1

15 ± 20 150

= 1 15 ± 2

15

= .06666667±.1333333

(19)

CIS 541 Arithmetics

Range Arithmetic

Adding variable precision to interval allows for knowing when not enough digits are used in a computation. (So that we can do it again with more).

How? We have a bound on the error. If we get above our tolerance, we simply increase the number of digits and start over.

A range is specified by adding a single digit r to the floating point representation.

(sign)0.d1d2. . . dn±r×10^e

r is the range digit and r and dn have the same decimal significance.

.39215±3×10⁵ specifies the range [39212,39218]

CIS 541 Arithmetics

Range Arithmetic

It is variable precision, but do we just allow infinite precision? No

Why? We can’t have an infinite representation.

So what do we do? We set the maximum precision in advance.

What happens as a calculation proceeds.

Will the error ever get smaller? No

Why? Because the error we are keeping track of is just a bound. So it only increases with uncertainity never decreases. The real error can get smaller, but our bound does not.

What happens as the error grows? The effective number of digits of precision shrinks. That is why we only need a single range digit. So the mantissa gets smaller.

What happens to the speed of the calculation as the error grows? Since the number of digitis in the mantissa is shrinking, the calculations will go faster.

(20)

Interval and Range notation

To convert between the forms m± and [a, b]:

m± = [m−, m+]

[a, b] = a+b

2 ±b−a 2 So 1±1 = [1−1,1 + 1] = [0,2]

range number +0.8888±9×10¹ mantissa error bound 0.0009 = 9×10⁻⁴ number error bound 9×10⁻³

range number −0.7244666±2×10⁻² mantissa error bound 0.0000002 = 2×10⁻⁷ number error bound 2×10⁻⁹

range number +0.200345±5×10³ mantissa error bound .000005 = 5×10⁻⁶ number error bound 5×10⁻³

Range arithmetic: examples

Remember r is only 1 digit, and is the same signifcance as the mantissa!

I₁ = 0.1±1×10¹ I₂ = 0.15±5×10² I₁+I₂ = 0.16±6×10² I1−I2 =−0.14±6×10²

I1×I2 = 15±25 = .15±.25×10² =.1±4×10² I₁÷I₂ =.06666±.13333×10⁰ = 0±2×10⁰

.345±4×10²+.234±5×10² =.579±9×10² .345±5×10²+.234±5×10² =.579±10×10²

=.57±2×10²

.345±5×10²+.234±6×10² =.579±11×10²

=> .57±2×10² => .57±3×10²

.345±9×10²+.234±9×10² =.579±18×10²

=> .57±2×10² => .57±3×10²

(21)

CIS 541 Arithmetics

Rational Arithmetic

R1 = p₁ q1

R₂ = p2

q₂ Addition

R₁+R₂ = p1

q₁ + p2

q₂

= p₁q₂+q₁p₂ q1q2

Subtraction

R1−R2 = p₁ q₁ − p₂

q₂

= p₁q₂−q₁p₂ q₁q₂

CIS 541 Arithmetics

Rational Arithmetic

R1 = p₁ q1

R₂ = p2

q₂ Multiplication

R1×R2 = p₁ q1 × p₂

q2

= p₁p₂ q₁q₂ Division

R₁÷R₂ = p1

q₁ ÷ p2

q₂

=

p₁q₂ q₁p₂

division error if p₂ = 0

(22)

Rational Arithmetic

Lets restrict denominator to always be positive, so we only have one conditional to check for sign.

So we modifiy the rule for division.

R1÷R2 = p₁ q1 ÷ p₂

q2

=







p₁q₂

q₁p₂ if p2 >0

−p₁q₂

q₁p₂ if p₂ <0 division error if p2 = 0

We also always want the rational to be in a reduced form, so that checking for equality is easier.

In order to reduce we must find the greatest commond denominator/divisor (gcd), D, and replace p,q with p⁰,q⁰ where.

p⁰ = p D q⁰ = q

D How do we find D?

Euclid

(23)

CIS 541 Arithmetics

Eucliden Algorithm to find gcd

Let’s use the Euclidean Algorithm to find the gcd of two positive numbers n1 & n2 by successive division.

• Divide the larger integer by the smaller to obtain an integer quotient ,d1, and integer remainder, n3. Let n1 ≥n2. (switch if needed)

n1 =d1n2+n3

d₁ = n1

n2

n3 =n1 (mod n2)

The gcd of n1 & n2 is also a divisor of n3 The gcd of n2 & n3 is also a divisor of n1 So we can shift the problem of findinggcd(n1, n2)into findinggcd(n2, n3).

We continue this until one of the numbers is 0, then the other is the gcd.

CIS 541 Arithmetics

Examples finding gcd

pair Relation

144, 78 144 = 1×78 + 66 78, 66 78 = 1×66 + 12 66, 12 66 = 5×12 + 6 12, 6 12 = 2×6 + 0 6,0 gcd is 6

pair Relation

205,55 205 = 3×55 + 40 55, 40 55 = 1×40 + 15 40, 15 40 = 2×15 + 10 15, 10 15 = 1×10 + 5 10, 5 10 = 2×5 + 0 5, 0 gcd is 5

pair Relation

(24)

Errors

Classes of errors Types of Errors

Errors

Combating errors

(25)

CIS 541 Nonsolvable problems

Nonsolvable problems

Can we solve

a=b with a computer? No

How about any other relation? No

a > b, a < b, a ≥b, a≤ b Why? Error, we don’t know what it is.

What about a= 0?

How about is a within 10⁻ⁿ of b?

In other words does

|a−b| ≤, = 10⁻ⁿ

Yes. We can always set the precision to greater than 10⁻ⁿ

CIS 541 Nonsolvable problems

Non Solvable problems

Can we obtain a correctkdecimal-place, fixed-point approximation to any real number c? No

Consider the following approximations of c with increasing precision:

.1111150±2 .111115000±5 .1111150000000±3

.1111150000000000000±6 ...

So is .11111 or .11112 the correct approximation for k = 5? We don’t know.

So do we have a problem here if we’d like to gaurantee something about our solution?

(26)

Non Solvable problems

Any ways to get around the problem?

We can determine a correct k or k + 1 decimal- place, fixed-point approximation to any real number c, so we avert the problem. This is very subtle difference, and is important if implementing such a system.

So in the above .111115is correct for k+ 1decimal places so the approximation is correct for k or k + 1 decimal places.

Ranged Relations

Let a and b be ranged approximations then:

if a overlaps b then a .

=b

if a is completely to the left of b then a<b˙ if a is completely to the right of b then a>b˙

Remember the precision of a and b matters, if the precision changes, so too may the dotted relation.

Dotted Relation Implied mathematical relation a<b˙ a < b

a>b˙ a > b a6.

=b a6= b

a=b˙ ?

Mathematical Relation Implied dotted relation

a < b a≤˙b

a > b a≥˙b

a=b a .

=b

a6=b ?

What does a≥˙b mean? How about a≤˙b mean?

(27)

CIS 541 Taylor’s Series

Taylors Series

sin(x) =x−x³ 3! +x⁵

5! −x⁷ 7! +. . .

if we graph the first few partial sums we see how the series converges to sin(x).

Notice how the series converges rapidly near the expansion and slowly or not at all away from it.

What does this tell us about the choice of c?

The number of terms needed for the same precision increases as we go away from c.

Sine

(28)

Taylors Series

f(x) = f(c)+f⁰(c)(x−c)+f⁰⁰(c)

2! (x−c)²+f⁰⁰⁰(c)

3! (x−c)³+. . .

=

∞

X

k=0

f^(k)(c)

k! (x−c)^k

This is the Taylors series of f at c. If c = 0 it is also known as a Maclaurin series.

So with the Taylors series, if we know a heck of a lot of information about a function at a single point, we can use that information to reconstruct the entire function, as opposed to methods that just know a little bit about the function at a lot of points.

When did Taylor do this work? 1715, he wrote this in a letter in 1712 and published a book in 1715. Brook Taylor, 18 Aug 1685 Edmonton, Middlesex, England, 29 Dec 1731. Not quite nobility but wealthy. Home schooled then went to cambridge.

Taylor created what is now called the ”calculus of finite differences”, invented integration by parts, and discovered the series known as Taylor’s expansion. (1715). Devised the basics of perspective (projective geometry). (he named it linear perspective and defined the vanishing point. More impressive than just one thereom, but his work was difficult to follow and he did not elaborate enough and died early (46). He fought a lot with Bernoulli (non-English)

Taylors Theorem

if f has continuous deriviatives of order 0,1,2, . . . , n in a closed interval I = (a, b), then for any x and c in I,

f(x) =

n−1

X

k=0

f^(k)(c)

k! (x−c)^k +Rn

where

Rn = ^f

n(ξ)(x−c)ⁿ

(n)! (Lagrange form)

or Rn = ^f

n(ξ)(x−ξ)ⁿ⁻¹(x−c)

(n−1)! (Cauchy’s form) and ξ is a point that lies between x and c.

Rn is the remainder or error term.

(29)

Taylors Theorem: Example

What is the Taylor’s Series for sin(x) Let’s choose c = 0?

f(x) = sin(x) f(c) =sin(0) = 0 f⁰(x) = cos(x) f⁰(c) =cos(0) = 1 f⁰⁰(x) = −sin(x) f⁰⁰(c) = −sin(0) = 0 f⁰⁰⁰(x) =−cos(x) f⁰⁰⁰(c) = −cos(0) =−1 f⁽⁴⁾(x) = sin(x) f⁽⁴⁾(c) =sin(0) = 0 f⁽⁵⁾(x) = cos(x) f⁽⁵⁾(c) =cos(0) =−1 f⁽⁶⁾(x) = −sin(x) f⁽⁶⁾(c) =−sin(0) = 0 f⁽⁷⁾(x) = −cos(x) f⁽⁷⁾(c) =−cos(0) = 1 f⁽⁸⁾(x) = sin(x) f⁽⁸⁾(c) =sin(0) = 0 f⁽⁹⁾(x) = cos(x) f⁽⁹⁾(c) =cos(0) =−1

f(x) =

∞

X

k=0

f^(k)(c)

k! (x−c)^k = f⁽⁰⁾(0)

0! (x)⁰+f⁽¹⁾(0)

1! (x)¹+ f⁽²⁾(0)

2! (x)²+f⁽³⁾(0)

3! (x)³+f⁽⁴⁾(0)

4! (x)⁴+f⁽⁵⁾(0)

5! (x)⁵+. . .

= 1

0!(x)⁰+0

1!(x)¹+−1

2! (x)²+0

3!(x)³+−1

4! (x)⁴+0

5!(x)⁵+. . .

=x−x³ 3! + x⁵

5! −x⁷ 7! +. . .

Taylors Theorem: Example

What is the Taylor’s Series for cos(x)?

f(x) = cos(x) f(c) = cos(0) = 1 f⁰(x) = −sin(x) f⁰(c) = −sin(0) = 0 f⁰⁰(x) =−cos(x) f⁰⁰(c) = −cos(0) =−1 f⁽³⁾(x) = sin(x) f⁽³⁾(c) = sin(0) = 0 f⁽⁴⁾(x) = cos(x) f⁽⁴⁾(c) = cos(0) = 1 f⁽⁵⁾(x) = −sin(x) f⁽⁵⁾(c) = −sin(0) = 0 f⁽⁶⁾(x) = −cos(x) f⁽⁶⁾(c) = −cos(0) = −1 f⁽⁷⁾(x) = sin(x) f⁽⁷⁾(c) = sin(0) = 0 f⁽⁸⁾(x) = cos(x) f⁽⁸⁾(c) = cos(0) =−1

cos(x) = 1

0!(x)⁰+0

1!(x)¹+−1

2! (x)²+0

3!(x)³+1

4!(x)⁴+0

5!(x)⁵+. . .

= 1−x² 2! +x⁴

4! − x⁵ 5! +. . .

(30)

Taylors Theorem: Example

What is the Taylor’s series for _1−x¹ ?

f(x) = _1−x¹ f(c) = f(0) = ₁₋₀¹ = 1 f⁰(x) = _1−x¹ ² f⁰(c) = 1

f⁰⁰(x) = _1−x² ³ f⁰⁰(c) = 2 f⁽³⁾(x) = _1−x⁶ ⁴ f⁽³⁾(c) = 6 f⁽⁴⁾(x) = _1−x²⁴ ⁵ f⁽⁴⁾(c) = 24 f⁽⁵⁾(x) = _1−x^5! ⁶ f⁽⁵⁾(c) = 5!

f⁽⁶⁾(x) = _1−x^6! ⁷ f⁽⁶⁾(c) = 6!

f⁽⁷⁾(x) = _1−x^7! ⁸ f⁽⁷⁾(c) = 7!

f⁽⁸⁾(x) = _1−x^8! ⁹ f⁽⁸⁾(c) = 8!

cos(x) = 1

0!(x)⁰+1

1!(x)¹+2

2!(x)²+3!

3!(x)³+4!

4!(x)⁴+5!

5!(x)⁵+. . .

= 1 +x+x²+x³+x⁴+x⁵+x⁶. . .

Taylors Theorem

What do the previous examples tell us about the choice of c? It can affect the complexity of the terms of the series. Which affects the speed and accuracy of the calculation. But remember that far away from c means more terms. So we need to consider both.

What about trig functions that repeat? Can we do something like shift the domain so it is alwasy say between −π and π

(31)

Mean Value Theorem

if f is a continous function on the closed interval [a, b] and possesses a derivative at each point of the open interval (a, b) then

f(b) =f(a) + (b−a)f⁰(ξ) for some ξ in (a, b) so

f⁰(ξ) = f(b)−f(a) b−a

So we have an approximation forf⁰(x) at any x within the interval (a, b)

Mean Value Theorem

What does the mean value theorem look like geometrically?

a a

aa a

a a

a aa

a a

a aa

a a

aa a

a a

a aa

a a

a aa

a s

f(a)

sf(b)

a a

aa a

a a

aa a

a aa

ξ

a b

There is some ξ between a and b such that the secant line between f(a) and f(b) is the same as the derivate at ξ or f⁰(ξ) = ^f(b)−f(a)_b−a .

(32)

Taylors Theorem f (x + h)

if f has continuous deriviatives of order 0,1,2, . . . ,(n + 1) in a closed interval I = (a, b), then for any x in I,

f(x+h) =

n

X

k=0

f^(k)(x)

k! (h)^k+E_n+1

where h is any value such that x+h is in I and where E_n = f⁽ⁿ⁺¹⁾(ξ)

(n+ 1)! (h)ⁿ⁺¹ There is some point ξ between a and b

ξ is a point that lies between x and x+h and E_n is the error term.

Where did this come from?

Let x ←x+h and c ←x

Alternating Series Theorem

if a₁ ≥ a₂ ≥ a₃ ≥ . . . ≥ 0 for all n and lim_n→∞an = 0 then the alternating series a₁−a₂ + a₃−a₄+. . . converges. That is:

∞

X

k=1

(−1)^k−1ak = lim

n→∞

n

X

k=1

(−1)^k−1ak = lim

n→∞Sn =S

where S is its sum and Sn is the nth partial sum.

Also, for n,

|S −Sn| ≤ an+1

So if the magnitudes of the terms in an alternating series converges monotonically to zero, then the error in truncating the series is no larger than the magnitude of the first omitted term.

What does this tell us about calculating sin & cos?

We can bound the error based on the first term we drop.