CHAPTER 4 Optimization
4.6 Nonlinear constrained optimization
So far, we have been examining the use of optimization techniques where the objective function and the set of constraints are both linear functions. We now consider situations where these functions are not linear.
How does non-linearity change the optimization problem? In a (non-integer constrained) linear system, the objective func- tion attains its maximum or minimum value at one of the vertices of a polytope defined by the constraint planes. Intuitively, because the objective function is linear, we can always ‘walk along’ one of the hyper-edges of the polytope to increase the value of the objective function, so that the extremal value of the objective function is guaranteed to be at a polytope vertex.
In contrast, with non-linear optimization, the objective function may both increase and decrease as we walk along what would correspond to a hyper-edge (a contour line, as we will see shortly). Therefore, we cannot exploit polytope vertices to carry out optimization. Instead, we must resort to one of a large number of non-linear optimization techniques, some of which we study next.
Non-linear optimization techniques fall into roughly into two categories.
∞
s i j k(, , ) = min s i j k( (( , , –1),s i k k(, , –1)+s k j k( , , –1)))
DRAFT - Version 2 - Lagrangian techniques
115
• When the objective function and the constraints are mathematically ‘nice’, that is, continuous and at least twice differen- tiable, there are two well-known techniques, Lagrangian optimization and Lagrangian optimization with the Karush- Kuhn-Tucker conditions.
• When the objective functions are not continuous or differentiable, we are forced to use heuristic techniques such as hill- climbing, simulated annealing, and ant algorithms.
We will first look at Lagrangian techniques (Section 4.6.1 on page 115), a variant called the KKT conditions that allows ine- quality constraints (Section 4.6.2 on page 116) then briefly consider several heuristic optimization techniques (Section 4.7 on page 117).
4.6.1 Lagrangian techniques
Lagrangian optimization computes the maximum (or minimum) of a function fof several variables subject to one or more constraint functions denoted gi. We will assume that f and all the gi are continuous, at least twice-differentiable, and are defined over the entire domain, that is, do not have ‘boundaries.’
Formally, f is defined over a vector x drawn from Rn and we wish to find the value(s) of x for which f attains its maximum or minimum, subject to the constraint function(s): gi(x) = ci, where the ci are real constants.
To begin with, consider a function f of two variables x and y with a single constraint function. We want to find the set of tuples of the form (x,y) that maximize f(x,y) subject to the constraint g(x,y) = c. The constraint gi(x) = ci corresponds to a contour or level set, that is, a set of points where g’s value does not change. Imagine tracing a path along such a contour.
Along this path, fwill increase and decrease in some manner. Imagine the contours of f corresponding to f(x) = d for some value of d. The path on g’s contour touches successive contours of f. An extremal value of f on g’s contour is reached exactly when g’s contour grazes an extremal contour of f. At this point, the two contours are tangential, so that the gradient of f’s contour (a vector that points in a direction perpendicular to the contour) has the same direction as the gradient of g’s contour (though it may have a different absolute value). More precisely, if the gradient is denoted by , then, at the constrained extremal point,
Define an auxiliary function:
(EQ 7)
The stationary points of F, that is the points where , are points that
(a) satisfy the constraint g, because the partial derivative with respect to , i.e., must be zero, and (b) are also constrained extremal points of f, because
Thus, the extremal points of F are also the points of constrained extrema of f (i.e minima or maxima). From Fermat’s theo- rem, the maximum or minimum value of any function is attained at one of three types of points: (a) a boundary point (b) a point where fis not differentiable and (c) at a stationary point where its first derivative is zero. Because we assume away the first two situations, the maximum or minimum is attained at one of the stationary points of F. Thus, we can simply solve
and use the second derivative to determine the type of extremum.
This analysis continues to hold for more than two dimensions and more than one constraint function. That is, to obtain a con- strained extremal point of f, take the objective function and add to it a constant multiple of each constraint function to get the auxiliary. This constant is called a Lagrange multiplier. The resulting system of equations is solved by setting the gradient of the auxiliary function to 0 to find its stationary points.
EXAMPLE 10: LAGRANGIANOPTIMIZATION
∇x y, ∂
∂x--- ∂
∂y---
⎝ , ⎠
⎛ ⎞
=
x y, f
∇ = –λ∇x y, g
F x y( , ,λ) = f x y( , ) λ+ (g x y( , )–c) F x y( , ,λ)
x y, ,λ
∇ = 0
λ g x y( , )–c
x y, f
∇ = –λ∇x y, g
F x y( , ,λ)
x y, ,λ
∇ = 0
DRAFT - Version 2 - Nonlinear constrained optimization
Consider a company that purchases capacity on a link to the Internet and has to pay for this capacity. Suppose that the cost of a link of capacity b is Kb. Also suppose that the mean delay experienced by data sent on the link, denoted by d, is inversely proportional to b, so that bd = 1. Finally, let the benefit U from using a network connection with capacity b, and delay d,
be described by U = -Kb -d, that is, it decreases both with cost and with the delay. We want to maximize U subject to the constraint bd = 1. Both U and the constraint function are continuous and twice-differentiable. Therefore, we can define the auxiliary function:
Set the partial derivatives with respect to b, d and to zero, to obtain, respectively:
From the second equation, and from the first equation, and from the third equation, substituting the values for b and d, . Substituting these values into the equations for b and d, and .This gives a value of U at (b, d) to be . Since U is clearly unbounded in terms of a smallest value (when b approaches 0), this is also its maximum.
4.6.2 Karush-Kuhn-Tucker conditions for nonlinear optimization
The Lagrangian method is applicable when the constraint function is of the form g(x) = 0. What if the constraints are of the form ? In this case, we can use the Karush-Kuhn-Tucker conditions, often called the KKT or Kuhn-Tucker condi- tions, to determine whether the stationary point of the auxiliary function is also a global minimum.
As a preliminary, we define what is meant by a convex function. A function f is convex if, for any two points x and y in its domain, and for t in the closed interval [0,1], . That is, the function always lies below a line drawn from x to y.
Consider a convex objective function f: with both m inequality and l equality constraints. Denote the inequality constraints by and the equality constraints by hj(x) = 0, . The KKT conditions require all the gi to be convex and all the hj to be linear. Then, if a is a point in Rn, and there exist m and l constants respectively, denoted and such that the following conditions hold, then we can guarantee that a is a globally constrained minimum of f:
(EQ 8)
If these conditions are met, then the stationary points of the auxiliary function (which is the first equation above) yield the minima of f.
0≤d<∞
F = –Kb–d+λ(bd–1) λ
K
– +λd = 0 1
– +λb = 0 bd = 1
b = 1⁄λ d = K⁄λ
λ = K b = 1⁄( K) d = K
2 K –
g x( )≤0
f tx( +(1–t)y)≤tf x( )+(1–t)y
Rn→R
gi( )x ≤0 1, ≤ ≤i m 1≤ ≤j l
μi νj
f a( )
∇ μi∇gi( )a
i=1 m
∑
μj∇hj( )aj=1 l
∑
+ + = 0
gi( )a ≤0∀i hj( )a = 0∀j
μi≥0∀i μigi( )a = 0∀i
DRAFT - Version 2 - Hill climbing