In view of (1.4.9) this ensures the existence of quantitiesεjwith
(1.4.10) c=
n−1
j=1
ajbj(1 +j·εj) +anbn(1 + (n−1 +δ)εn),
|εj| ≤ eps
1−n·eps, δ:= 0 ifan= 1, 1 otherwise.
Forr=c−a1b1−a2b2− · · · −anbnwe have consequently (1.4.11) |r| ≤ eps
1−n·eps n−1
j=1
j|ajbj|+ (n−1 +δ)|anbn|
.
In particular, (1.4.8) reveals the numerical stability of our algorithm for comput- ingβn. The roundoff errorαmcontributes the amount
c−a1b1−a2b2− · · · −ambm
an αm
to the absolute error inβn. This, however, is at most equal to c·εc−a1b1εa1a− · · · −ambmεαm
n
≤
|c|+"m
i=1|aibi| eps
|an| ,
which represents no more than the influence of the input errors εc and εai of c andai, i= 1,. . .,m, respectively, provided |εc|,|εai| ≤eps. The remaining roundoff errorsµkandδare similarly shown to be harmless.
The numerical stability of the above algorithm is often shown by interpreting (1.4.10) in the sense of backward analysis: The computed approximate solution bnis the exact solution of the equation
c−¯a1b1−. . .−a¯nbn= 0, whose coeffizients
¯
aj:=aj(1 +j·εj), 1≤j≤n−1,
¯
aj:=aj(1 + (n−1 +δ)εn)
have changed only slightly from their original values aj. This kind of analysis, however, involves the difficulty of having to define how large n can be so that errors of the formnε,|ε| ≤eps can still be considered as being of the same order of magnitude as the machine precision eps.
1.5 Interval Arithmetic; Statistical Roundoff
numerical method, however, the number of the arithmetic operations, and consequently the number of individual roundoff errors, is very large, and the corresponding algorithm is too complicated to permit the estimation of the total effect ofallroundoff errors in this fashion.
A technique known asinterval arithmeticoffers an approach to deter- mining exact upper bounds for the absolute error of an algorithm, taking into account all roundoff and data errors. Interval arithmetic is based on the realization that the exact values for all real numbersa∈IR which ei- ther enter an algorithm or are computed as intermediate or final results are usually not known. At best one knows small intervals which containa. For this reason, the interval-arithmetic approach is to calculate systematically in terms of such intervals
˜
a= [a, a],
bounded by machine numbers a, a ∈ A, rather than in terms of single real numbersa. Each unknown numberais represented by an interval ˜a= [a, a] with a ∈a. The arithmetic operations˜ ∈ { ⊕,,⊗, }between intervals are defined so as to becompatiblewith the above interpretation.
That is, ˜c:= ˜a ˜bis defined as an interval (as small as possible) satisfying
˜
c⊃ {a b|a∈˜aandb∈˜b.
and having machine number endpoints.
In the case of addition, for instance, this holds if⊕is defined as follows:
[c, c] := [a, a]⊕[b, b] where
c := max{γ∈A|γ≤a+b} c:= min{γ∈A|γ≥a+b},
withAdenoting again the set of machine numbers. In the case of multipli- cation⊗, assuming, say,a >0,b >0,
[c, c] := [a, a]⊗[b, b] can be defined by letting
c := max{γ∈A|γ≤a×b}, c:= min{γ∈A|γ≥a×b}.
Replacing, in these and similar fashions, every quantity by an interval and every arithmetic operation by its corresponding interval operation – this is readily implemented on computers – we obtain interval algorithms which produce intervals guaranteed to obtain the desired exact solutions. The data for these interval algorithms will be again intervals, chosen to allow for data errors.
It has been found, however, that an uncritical utilization of interval arithmetic techniques leads to error bounds which, while certainly reliable, are in most cases much too pessimistic. It is not enough to simply substitute interval operations for arithmetic operations without taking into account how the particular roundoff or data enter into the respective results. For example, it happens quite frequently that a certain roundoff errorεimpairs someintermediate resultsu1, . . . , un of an algorithm considerably,
∂ui
∂ε
1 for i= 1, . . . , n,
while thefinalresulty=f(u1, . . . , un) is not strongly affected, ∂y
∂ε ≤1,
even though it is calculated from the highly inaccurate intermediate values u1, . . . , un: the algorithm shows error damping.
Example 1.Evaluatey=φ(x) =x3−3x2+ 3x= ((x−3)×x+ 3)×xusing Horner’s scheme:
u:=x−3, v:=u×x, w:=v+ 3, y:=w×x.
The valuexis known to lie in the interval x∈x˜:= [0.9,1.1].
Starting with this interval and using straight interval arithmetic, we find
˜
u= ˜x[3,3] = [−2.1,−1.9],
˜
v= ˜u⊗˜x= [−2.31,−1.71],
˜
w= ˜v[3,3] = [0.69,1.29],
˜
y= ˜w⊗x˜= [0.621,1.419].
The interval ˜yis much too large compared to the interval {φ(x)|x∈x˜}= [0.999,1.001], which describes the actual effect of an error inxonφ(x).
Example 2.Using just ordinary 2-digit arithmetic gives considerably more ac- curate results than the interval arithmetic suggests:
x= 0.9 x= 1.1 u −2.1 −1.9
v −1.9 −2.1
w 1.1 0.9
y 0.99 0.99
For the successful application of interval arithmetic, therefore, it is not sufficient merely to replace the arithmetic operations of commonly used algorithms by interval operations: It is necessary to develop new algorithms producing the same final results but having an improved error-dependence pattern for the intermediate results.
Example 3.In Example 1 a simple transformation ofϕ(x) suffices:
y=ϕ(x) = 1 + (x−1)3.
When applied to the corresponding evaluation algorithm and the same starting interval ˜x= [0.9,1.1], interval arithmetic now produces the optimal result:
˜
u1:= ˜x[1, 1] = [−0.1, 0.1],
˜
u2:= ˜u1⊗u˜1= [−0.01, 0.01],
˜
u3:= ˜u2⊗u˜1= [−0.001, 0.001],
˜
y:= ˜u3⊕[1, 1] = [0.999, 1.001].
As far as ordinary arithmetic is concerned, there is not much difference between the two evaluation algorithms of Example 1 and Example 3. Using two digits again, the results are practically identical to those in Example 2:
x= 0.9 x= 1.1
u1 −0.1 0.1
u2 0.01 0.01 u3 −0.001 0.001
y 1.0 1.0
For an in-depth treatment of interval arithmetic the reader should con- sult, for instance, Moore (1966) or Kulisch (1969).
In order to obtain statistical roundoff estimates [Rademacher (1948)], we assume that the relative roundoff error [see (1.2.6)] which is caused by an elementary operation is a random variable with values in the interval [−eps,eps]. Furthermore we assume that the roundoff errorsεattributable to different operations are independent random variables. Byµεwe denote the expected value and byσεthe variance of the above round-off distribu- tion. They satisfy the general relationship
µε=E(ε), σ2ε=E(ε−E(ε))2=E(ε2)−(E(ε))2=µε2−µ2ε. Assuming a uniform distribution in the interval [−eps,eps], we get (1.5.1) µε=E(ε) = 0, σ2ε=E(ε2) = 1
2 eps
# eps
−eps
t2dt= 1 3
eps =: ¯2 ε2. Closer examinations show the roundoff distribution to be not quit uniform [see Sterbenz (1974)), Exercise 22, p. 122]. It should also be kept in mind that the ideal roundoff pattern is only an approximation to the roundoff
patterns observed in actual computing machinery, so that the quantitiesµε andσ2ε may have to be determined empirically.
The resultsxof algorithms subjected to random roundoff errors become random variables themselves with expected values µx and variances σ2x connected again by the basic relation
σ2x=E(x−E(x))2=E(x2)−(E(x))2=µx2−µ2x.
The propagation of previous roundoff effects through elementary operations is described by the following formulas for arbitrary independent random variablesx, yand constantsα, β∈IR:
(1.5.2)
µαx±βy=E(αx±βy) =αE(x)±βE(y) =αµx±βµy, σ2αx±βy=E((αx±βy)2)−(E(αx±βy))2
=α2E(x−E(x))2+β2E(y−E(y))2=α2σx2+β2σy2. The first of the above formulas follows by the linearity of the expected-value operator. It holds for arbitrary random variablesx, y. The second formula is based on the relationE(x y) =E(x)E(y), which holds wheneverxand y are independent. Similarly, we obtain for independentxandy
(1.5.3)
µx×y=E(x×y) =E(x)E(y) =µxµy,
σx×y2 =E[x×y)−E(x)E(y)]2=µx2µy2−µ2xµ2y
=σ2xσ2y+µ2xσy2+µ2yσ2x.
Example.For calculating y =a2−b2 (see example 2 in Section 1.3) we find, under the assumptions (1.5.1), E(a) = a, σ2a = 0,E(b) = b,σb2 = 0 and using (1.5.2) and (1.5.3), that
η1 =a2(1 +ε1), E(η1) =a2, σ2η1 =a4ε¯2, η2 =b2(1 +ε2), E(η2) =b2, σ2η2 =b4ε¯2,
y= (η1−η2)(1 +ε3), E(y) =E(η1−η2)E(1 +ε3) =a2−b2, (η1, η2, ε3 are assumed to be independent),
σ2y=σ2η1−η2σ1+2 ε3+µ2η1−η2σ21+ε3+µ21+ε3ση21−η2
= (σ2η1+ση22)¯ε2+ (a2−b2)2ε¯2+ 1(σ2η1+σ2η2)
= (a4+b4)¯ε4+ [(a2−b2)2+a4+b4]¯ε2. Neglecting ¯ε4 compared to ¯ε2yields
σy2 .
= ((a2−b2)2+a4+b4)¯ε2.
Fora:= 0.3237,b= 0.3134, eps = 5×10−4 (see example 5 in Section 1.3), we find
σy .
= 0.144¯ε= 0.000 0415,
which is close in magnitude to the true error∆y= 0.000 01787 for 4-digit arith- metic. Compare this with the error bound 0.000 10478 furnished by (1.3.17).
We denote byM(x) the set of all quantities which, directly or indirectly, have entered the calculation of the quantityx. IfM(x)∩M(y)=∅for the algorithm in question, then the random variables x and y are in general dependent.
The statistical roundoff error analysis of an algorithm becomes ex- tremely complicated if dependent random variables are present. It becomes quite easy, however, under the following simplifying assumptions:
(1.5.4)
(a) The operands of each arithmetic operation are independent random variables.
(b) In calculating variances all terms of an order higher than the smallest one are neglected.
(c) All variances are so small that for elementary operations in first- order approximation,E(x y) .
=E(x) E(y) =µx µy.
If in addition the expected valuesµx are replaced by the estimated values x, and relative variancesε2x :=σx2/µ2x ≈σx2/x2 are introduced, then from (1.5.2) and (1.5.3) [compare (1.2.6), (1.3.5)],
(1.5.5)
z= fl(x±y) : ε2z .
= x
z 2
ε2x+ y
z 2
ε2y+ ¯ε2, z= fl(x±y) : ε2z .
=ε2x+ε2y+ ¯ε2, z= fl(x/y) : ε2z .
=ε2x+ε2y+ ¯ε2.
It should be kept in mind, however, that these results are valid only if the hypothesis (1.5.4), in particular (1.5.4a), are met.
It is possible to evaluate above formulas in the course of a numerical computation and thereby to obtain an estimate of the error of the final results. As in the case of interval arithmetic, this leads to an arithmetic of paired quantities (x, ε2x) for which elementary operations are defined with the help of the above or similar formulas. Error bounds for the final results r are then obtained from the relative varianceε2r, assuming that the final error distribution is normal. This assumption is justified inasmuch as the distributions of propagated errors alone tend to become normal if subjected to many elementary operations. At each such operation the nonnormal roundoff error distribution is superimposed on the distribution of previous errors. However, after many operations, the propagated errors are large compared to the newly created roundoff errors, so that the latter do not appreciably affect the normality of the total error distribution. Assuming the final error distribution to be normal, the actual relative error of the final resultris bounded with probability 0.9 by 2εr.
Exercises for Chapter 1
1. Show that with floating-point arithmetic oftdecimal places rd(a) = a
1 +ε with |ε| ≤5·10−t
holds in analogy to (1.2.2). [In parallel with (1.2.6), as a consequence, fl(a b) = (a b)/(1 +ε) with|ε| ≤5·10−tfor all arithmetic operations= +,−,
×,/.]
2. Let a,b,cbe fixed-point numbers with N decimal places after the decimal point, and suppose 0 < a,b, c < 1. Substitute product a∗b is defined as follows: Add 10−N/2 to the exact producta·b, and delete the (N+ 1)-st and subsequent digits.
(a) Give a bound for|(a∗b)∗c−abc|.
(b) By how many units of theN-th place can (a∗b)∗canda∗(b∗c) differ
?
3. Evaluating "n
i=1aj in floating-point arithmetic may lead to an arbitrarily large relative error. If, however, all summands ai are of the same sign, then this relative error is bounded. Derive a crude bound for this error, disregard- ing terms of higher order.
4. Show how to evaluate the following expressions in a numerically stable fash-
ion: 1
1 + 2x−1−x
1 +x for |x| 1,
$ x+1
x−
$ x−1
x for x1, 1−cosx
x for x= 0,|x| 1.
5. Suppose a computer program is available which yields values for arcsiny in floating-point representation withtdecimal mantissa places and for|y| ≤1 subject to a relative errorεwith|ε| ≤5×10−t. In view of the relation
arctanx= arcsin x
√1 +x2,
this program could also be used to evaluate arctanx. Determine for which valuesxthis procedure is numerically stable by estimating the relative error.
6. For givenz, the function tanz/2 can be computed according to the formula tanz
2 =±1−cosz 1 + cosz
1/2
.
Is this method of evaluation numerically stable for z ≈ 0, z ≈ π/2 ? If necessary, give numerically stable alternatives.
7. The function
f(ϕ, kc) := 1
cos2ϕ+kc2sin2ϕ
is to be evaluated for 0≤ϕ≤π/2, 0< kc≤1.
The method
k2: = 1−k2c, f(ϕ, kc) : = 1
1−k2sin2ϕ
avoids the calculation of cosϕ and is faster. Compare this with the direct evaluation of the original expression forf(ϕ, kc) with respect to numerical stability.
8. For the linear functionf(x) :=a+b x, wherea= 0,b= 0, compute the first derivativeDhf(0) =f(0) =bby the formula
Dhf(0) = f(h)−f(−h) 2h
in binary floating-point arithmetic. Suppose thataandbare binary machine numbers, andha power of 2. Multiplication byhand division by 2hcan be therefore carried out exactly. Give a bound for the relative error ofDhf(0).
What is the behavior of this bound ash→0 ?
9. The square root±(u+iυ) of a complex numberx+iywithy= 0 may be calculated from the formulas
u=±
$ x+
x2+y2 2 υ= y
2u
.
Compare the casesx≥0 andx <0 with respect tom their numerical stabil- ity. Modify the formulas if necessary to ensure numerical stability.
10. The varianceS2, of a set of observationsx1, . . . , xnis to determined. Which of formulas
S2 = 1 n−1
% n
i=1
x2i−n¯x2
&
,
S2 = 1 n−1
n i=1
(xi−¯x)2 with ¯x:= 1 n
n i=1
xi
is numerically more trustworthy ?
11. The coefficientsar,br(r= 0, . . . , n) are, for fixedx, connected recursively:
bn:=an;
(∗) for r=n−1, n−2, . . . ,0 : br:=xbr+1+ar. (a) Show that the polynomials
A(z) :=
n r=0
arzr, B(z) :=
n r=1
brzr−1
satisfy
A(z) = (z−x)·B(z) +b0.
(b) SupposeA(x) =b0 is to be calculated by the recursion (∗) for fixedxin floating-point arithmetic, the result beingb0. Show, using the formulas (compare Exercise 1)
fl(u+υ) = u+υ
1 +σ, |σ| ≤eps, fl(u·υ) = u·υ
1 +π, |π| ≤eps, the inequality
|A(x)−b0| ≤ eps
1−eps(2e0− |b0|), wheree0 is defined by the following recursion:
en:=|an|/2;
for r=n−1, n−2, . . . ,0; er:=|x|ar+1+|br|.
Hint:From bn:=an,
pr:= fl(xbr+1) = xbr+1
1 +πr+1
br:= fl(pr+ar) = pr+ar
1 +σr =xbr+1+ar+δr
r=n−1, . . . ,0,
derive
δr=−xbr+1 πr+1
1 +πr+1 −σrbr (r=n−1, . . . ,0);
then showb0="n
k=0(ak+δk)xk,δn:= 0, and estimate"n 0|δk||x|k. 12. Assuming Earth to be special, two points on its surface can be expressed
in Cartesian coordinates
pi= [xi, yizi] = [rcosαicosβi, rsinαicosβi, rsinβi], i= 1,2, where ris the earth radius andαi, βi are the longitudes and latitudes of the two pointspi, respectively. If
cosσ=pT1p2
r2 = cos(α1−α2) cosβ1cosβ2+ sinβ1sinβ2, thenrσ is thegreat-circle distancebetween the two points.
(a) Show that using the arccos function to determineσfrom the above expression is not numerically stable.
(b) Derive a numerically stable expression for σ.