CHAPTER 4 Optimization
4.3 Optimizing linear systems
We often wish to maximize1 an objective function O that is a linear combination of the control parameters x1, x2,... xn and can therefore be expressed as:
(EQ 1)
where the s are real scalars. Moreover, the xis are typically constrained by linear constraints in the following standard form:
(EQ 2)
(EQ 3)
or, using matrix notation:
(EQ 4)
(EQ 5)
where A is an matrix, and x and b are column vectors with n and m elements respectively, with . Let A’s rank be r. To allow optimization, the system must be underconstrained with , so that some of the xis can be written as a linear combination of the others, which form a basis for A2.
Generalizing from Example 3, each equality corresponds to a hyperplane, which is a plane in more than two dimensions (this is not as intuitive as it sounds: for instance, a 3-plane is a solid that fills the entire Euclidean space). The constraints ensure that valid xis lie at the intersection of these hyperplanes.
Note that we can always transform an inequality of the form
to an equality by introducing a new variable si called the surplus variable such that
(EQ 6)
By treating the si as a virtual control parameters, we can convert a constraint that has a greater-than inequality into the stand- ard form (we ignore the value assigned to a surplus variable). Similarly, introducing a slack variable converts lesser-than ine- qualities to equalities. Therefore, any linear system of equal and unequal constraints can be transformed into the standard form that has only equality constraints. Once this is done, we can use linear programming(discussed below) to find the value of x that maximizes the objective function.
1. In this chapter, we always seek to maximize the objective function. Identical techniques can be used for minimization.
2. To understand this section more fully, the reader may wish to review Section 3.4 on page 82.
O = c1x1+c2x2+…+cnxn ci
a11x1+a12x2+…+a1nxn = b1 a21x1+a22x2+…+a2nxn = b2
…
…
am1x1+am2x2+…+amnxn = bm
Ax = b
x≥0
m×n n≥m
r<n
ai1x1+ai2x2+…+ainxn≥bi
ai1x1+ai2x2+…+ainxn–si = bi
DRAFT - Version 2 - Optimizing linear systems
EXAMPLE 4: REPRESENTINGALINEARPROGRAMINSTANDARDFORM
Consider a company that has two network connections to the Internet through two providers (this is also called multi-hom- ing). Suppose that the providers charge per-byte and provide different delays. For example, the lower-priced provider may guarantee that transit delays are under 50ms, and the higher-priced provider may guarantee a bound of 20ms. Suppose the company has two commonly used applications, A and B, that have different sensitivities to delay. Application A is more tol- erant of delay than application B. Moreover, the applications, on average, generate a certain amount of traffic every day, which has to be carried by one of the two links. The company wants to allocate all the traffic from the two applications to one of the two links, maximizing their benefit while minimizing its payments to the link providers. Represent the problem in standard form.
Solution:
The first step is to decide how to model the problem. We must have variables that reflect the traffic sent by each application on each link. Call the lower-priced provider l and the higher priced provider h. Then we denote the traffic sent by A on l as xAl and the traffic sent by A on h as xAh. Define xBl and xBh similarly. The traffic sent is non-negative, so we have:
; ; ; ;
If the traffic sent each day by application A is denoted TA and the traffic sent by B by TB, we have:
;
Suppose that the providers charge cl and ch monetary units per byte. Then, the cost to the company is:
What is the benefit to the company? Suppose that application A gains a benefit of bAl per byte from sending traffic on link l and bAh on link h. Using similar notation for the benefits to application B, the overall benefit (i.e., benefit - cost) that the company should maximize, which is its objective function, is:
O =
Thus, in standard form, the linear program is the objective function above, and the constraints on the variables expressed as:
;
Note that, in this system, n = 4 and m = 2. To allow optimization, the rank of the matrix A must be smaller than n = 4. In this case, the rank of A is 2, so optimization is feasible.
How can we find values of the xij such that O is maximized? Trying every possible value of x is an exponentially difficult task, so we have to be cleverer than that. What we need is an algorithm that systematically chooses xis that maximize or min- imize O.
To solve a linear system in standard form, we draw on the intuition developed in Examples 2 and 3. Recall that in Example 3, the optimal value of O was reached at one of the vertices of the constraint plane. This is because any other point has a neigh- bour that lies on a better isoquant. It is only at a vertex that we ‘run out’ of better neighbours3. Of course, in some cases, the isoquant can be parallel to one of the hyperedges of a constraint hyperplane. In this case, the O attains a minimum or maxi- mum along an entire edge.
xAl≥0 xAh≥0 xBl≥0 xBh≥0
xAl+xAh = TA xBl+xBh = TB
xAlcl+xBlcl+xAhch+xBhch = C
bAl–cl
( )x
Al (bAh–ch)x
Ah (bBl–cl)x
Bl (bBh–ch)x
+ + + Bh
1 1 0 0 0 0 1 1
xAl xAh xBl xBh
TA TB
=
xAl xAh xBl xBh
0 0 0 0
≥
DRAFT - Version 2 - Network flow
109
In a general system, the constraint plane corresponds to a mathematical object called a polytope that is defined as a convex hyperspace bounded by a set of hyperplanes. In such a system, it can be shown that the extremal value of the objective func- tion is attained at one of the vertices of the constraint polytope. It is worth noting that a polytope in more than three dimen- sions is rather difficult to imagine: for instance, the intersection of two four-dimensional hyperplanes is a three-dimensional solid. The principal fact about a polytope that is needed when carrying out an optimization is that each of its vertices is defined by n coordinates, which are the values assumed by the xis at that vertex. The optimal value of O is achieved for the values of the xis corresponding to the optimal vertex.
The overall approach to finding the optimal vertex is first, to locate any one vertex of the polytope, second, to move from this vertex to the neighbouring vertex where the value of the objective function is the greatest, and finally, to repeat this process until it reaches a vertex such that the value of the objective function at this vertex is greater than the objective function’s value at all of its neighbours: this must be the optimal vertex. This algorithm, developed by G. Dantzig, is the famous sim- plex algorithm.
The simplex algorithm builds on two underlying procedures: finding any one vertex of the polytope and generating all the neighbours of a vertex. The first procedure is carried out by setting n - r of the xis to 0 so that the resulting system has rank n, and solving the resultant linear system using, for example, Gaussian elimination. The second procedure is carried out using the observation that because A’s rank is , it is always possible to compute a new basis for A that differs from the current basis in only one column. It can be shown that this basis defines a neighbouring vertex of the polytope.
To carry out simplex in practice, we have to identify if the program has incompatible constraints. This is easy because, if this is the case, then the Gaussian elimination in the first procedure fails. A more subtle problem is that it is possible for a set of vertices to have the same exact value of O, which can lead to infinite loops. We can eliminate this problem by slightly jitter- ing the value of O at these vertices or using other similar anti-looping algorithms.
From the perspective of a practitioner, to use linear programming, all that needs to be done is to specify the objective func- tion and the constraints to a program called a Linear Program Solver or LP Solver. CPLEX and CS2 are two examples of well-known LP Solvers. A solver returns either the optimal value of the objective function and the vertex at which it is achieved or declares the system to be unsolvable due to incompatible constraints. Today’s LP Solvers can routinely solve systems with more than 100,000 variables and tens of thousands of constraints.
The simplex algorithm has been found to work surprisingly well in dealing with most real-life problems. However, in the worst case, it can take time exponential in the size of the input (i.e, the number of variables) to find an optimal solution.
Another LP solution algorithm, called the ellipsoidal method, is guaranteed to terminate in O(n3) time, where n is the size of the input, although its performance for realistic problems is not much faster than simplex. Yet another competitor to the sim- plex algorithm is the interior point method that finds the optimal vertex not by moving from vertex to vertex, but by using points interior to the polytope.
Linear programming is a powerful tool. With an appropriate choice of variables, it can be used to solve problems, that, at first glance, may not appear to be linear programs. As an example, we now consider how to set up the network flow problem as a linear program.
4.3.1 Network flow
The network flow problem models the flow of goods in a transportation network. Goods may be temporarily stored in ware- houses. We represent the transportation network by a graph. Each graph node corresponds to a warehouse and each directed edge, associated with a capacity, corresponds to a transportation link. The source node has no edges entering it and the sink node has no edges leaving it. The problem is to determine the maximum possible throughput between the source and the sink. We can solve this problem using LP, as the next example demonstrates.
EXAMPLE 5: NETWORKFLOW
3. For non-linear objective functions, we could ‘run out’ of better points even within the constraint plane, so the optimal point may not lie at a vertex.
r<n
DRAFT - Version 2 - Integer linear programming
Consider the network flow graph in Figure 4. Here, the node s represents the source and has a total capacity of 11.6 leaving it. The sink, denoted t, has a capacity of 25.4 entering it. The maximum capacity from s to t can be no larger than 11.6, but may be smaller, depending on the intermediate paths.
FIGURE 4. Example of a network flow problem
We can compute the maximal flow that can be sustained on a network flow graph using linear programming. Denote the capacity of the link ij from i to j by cij and the amount of traffic assigned to that link (as part of a flow from s to t) by fij. For example, in Figure 4, c12 = 10.0 and we may assign f12 = 2.3 on it as part of the overall flow from s to t. There are three types of constraints on the fijs:
1. Capacity constraints: the flow on a link cannot exceed its capacity, that is, .
2. Conservation conditions: all the flow entering a node (other than the sink) must exit it; that is, for all nodes j other than
sand t, .
3. Non-negativity:
Given these constraints, the objective function maximizes the flow leaving s. That is O = . The LP is now easy to frame. It consists of the capacity inequalities (written as equalities after introducing slack variables), the conservation condi- tions (with the right hand side carried over to the left and adding slack variables), and the conditions on the flows being non- negative. Some examples of these constraints are: on edge 5-7, and at vertex 3, .