facturing can be moved and manipulated through matrix operations. Image analysis will use matrices in convolution operations to improve the quality of an image and to identify the bounds of objects in a picture. Fourier analysis will describe complex wave patterns in terms of simpler sine waves through matrix manipulation.
The common issue in all of these cases is that the matrix operations are done frequently, and so faster matrix multiplication results in faster programs.
Object transformations use 4 4 matrices, convolutions can use square matri- ces from 3 3 up to 11 11 or bigger, but they are used a very large number of times. Convolutions, for example, will take a matrix and multiply it by blocks of pixels in an image for every possible location. This means that for a 5 5 template used with a small image of 512 512 pixels (about one-quarter of a typical computer screen), a convolution will multiply this matrix by 258,064 different locations (508 508). If the standard matrix multiplication algorithm is used, this will result in 32,258,000 multiplications. A more efficient matrix multiplication algorithm can save significant time in this application.
In this chapter we will investigate ways to make polynomial evaluation and matrix multiplication more efficient. Because we are interested in how many calculations are done, we will be counting additions and multiplications. When we considered searching and sorting, we found equations that were based on the size of the list. In analyzing numeric algorithms, we will base our equations on the power of the highest order term in a polynomial equation, or the dimensions of the matrices we are multiplying.
The standard evaluation algorithm is very straightforward:
Evaluate( x )
x the value to use for evaluation of the polynomial result = a[0] + a[1]*x
xPower = x
for i = 2 to n do xPower = xPower * x
result = result + a[i]*xPower end for
return result
This algorithm is very clear and its analysis is obvious. The for loop has two multiplications and is done N 1 times. There is one multiplication done before the loop, giving a total of 2N 1 multiplications. There is one addition done inside the loop and one done before it, giving N additions.
■ 4.1.1 Horner’s Method
Horner’s method gives a better way to do this evaluation without making the process very complex. This method is based on recognizing that the polyno- mial equation can be factored into the following form:
(4.2) The reader should be able to easily see that this calculates the same value as Equation 4.1. This can be expressed in algorithmic form as
HornersMethod( x )
x the value to use for evaluation of the polynomial result = a[n]
for i = n - 1 down to 0 do result = result * x result = result + a[i]
end for return result
We see that the loop is done N times and that there is one addition and one multiplication done in the loop. This means that there are N multiplications and N additions done by Horner’s method. This method saves almost half of the multiplications done by the standard algorithm.
■ 4.1.2 Preprocessed Coefficients
It is possible to do even better than this by preprocessing the coefficients. The basic idea here is that it is possible to express a polynomial as a factorization
p x( ) = ({…[(anx+an–1)*x+an–2]*x+…+a2}*x+a1)*x+a0
into two polynomials of lesser degree. For example, if you want to calculate x256, you could use a loop like the one in the function Evaluate at the start of this section and do 255 multiplications. An alternative would be to set result = x * x and then do the statement result = result * result three times. We get the same answer with just four multiplications. After the first, result will hold x4. After the second, it will hold x16, and after the third, it will hold x256.
For preprocessed coefficients to work, we need our polynomial to be monic (an = 1) and to have its largest degree equal to 1 less than a power of 2 (n = 2k 1 for some k = 1).1 If this is the case, we can factor the polynomial so that
p(x) = (xj + b)*q(x) + r(x) where j = 2k1 (4.3) There will be half as many terms in q(x) and r(x) as in p(x). To get the results we want, we would evaluate q(x) and r(x) and then do one additional multipli- cation and two additions. The interesting thing about this process is that if we choose the value of b carefully, both q(x) and r(x) will be monic polynomials with the proper degree for this process to be applied again. After all of this is done, we will see that this process does save calculations.
Instead of looking at just generic polynomials, consider the following:
p(x) = x7 + 4x6 8x4 + 6x3 + 9x2 + 2x 3
We first need to determine the value of (xj + b) for Equation 4.3. Looking at p(x) we see that its largest degree is 7, which is 23 1, so that means k is 3.
This makes j =22 = 4. We choose a value of b so that both of the equations, q(x) and r(x), are monic. To achieve that, we need to look at the coefficient of thej 1 term in the equation and make b = aj1 1. For our above equation, this means that b will have the value of a3 1, or 5. We now need to find the values of q(x) and r(x) that satisfy the equation
x7 + 4x6 8x4 + 6x3 + 9x2 + 2x 3 = (x4 + 5) *q(x) + r(x)
1 The savings of this method can be large enough that it is sometimes faster to add the terms necessary to be able to use this method and then subtract those values from the result returned. In other words, if we had an equation with degree 30, we would add x31, determine the factorization, and then subtract x31 from every answer. This would still save time over using another method for the calculation.
If we divide p(x) by x4 + 5, we will get a quotient and remainder polynomials, and those are the values of q(x) and r(x), respectively. So, we need to divide as follows:
This gives the equation
But we can apply this process to each of the polynomials for q(x) and r(x):
The results of all of this would be
If we look at this polynomial, we will see that there is one multiplication to calculate x2 and another to calculate x4 (done as x2 * x2). There are also three additional multiplications done in the equation, for a total of five multiplica- tions. There are 10 additions done in this equation as well. Comparing this to the other methods, we get the table in Fig. 4.1. This doesn’t look like a great saving, but this is just for a limited case. We can get a general equation for the amount of work done by looking carefully at the process. We first notice that we do only one multiplication and two additions in Equation 4.3. This gives the following set of recurrence relations for the number of multiplications, M(k), and the number of additions, A(k), where N = 2k 1:
) x7 4x6 0x5 8x4 6x309x2 2x03 x7 4x6 0x5 8x4 5x3 20x2 0x 40 x3 11x2 2x 37 x304x2 0x08 x4 5
p x( ) = (x4+5)*(x3+4x2+0x+8)+(x3–11x2+2x–37)
x14
) x3 4x2 0x18 )
x3 4x20x14
x2 1
x 12
x3 11x2 2x 37 x 11
x3 11x22x 11
x2 1
x 26
p x( ) = (x4+5)*[(x2–1)(x+4)+(x+12)]+[(x2+1)(x–11)+(x–26)]
M( )1 = 0
M k( ) = 2M k( –1)+1 for k>1
A( )1 = 0
A k( ) = 2A k( –1)+2 for k>1
Solving these equations we find that we will do approximately N / 2 multi- plications and (3N 1) / 2 additions. This doesn't, however, include the mul- tiplications to get the sequence of values x2,x4,x8,. . .,x2k1, which takes an additionalk 1 multiplications. Thus, there are about N / 2 + lg N total mul- tiplications.
Figure 4.2 gives a comparison of the standard algorithm, Horner’s method, and preprocessed coefficients. In comparing the last two, we see that we have saved N / 2 lg N multiplications but at a cost of (N 1) / 2 additions. By most standards, trading a multiplication for an addition will result in a time sav- ings, so using preprocessed coefficients is more efficient.
4.1.3
1. Give the factorization of the equation x7 + 2x6 + 6x5 + 3x4 + 7x3 + 5x + 4 that results from
a. Horner’s method b. Preprocessed coefficients
2. Give the factorization of the equation x7 + 6x6 + 4x4 2x3 + 3x2 7x + 5 that results from
a. Horner’s method b. Preprocessed coefficients
■ FIGURE 4.1 Work done for a polynomial of degree 7
Method Multiplications Additions
Standard 13 7
Horner’s 7 7
Preprocessed coefficients 5 10
■ FIGURE 4.2 Work done for a polynomial of degreeN
Method Multiplications Additions
Standard Horner’s
Preprocessed coefficients
2N–1 N
N N
N
----2+lgN 3N–1 ---2
4.1.3 EXERCISES
■