Sequencing for Stochastic Scheduling
6.6 Stochastic Dominance and Association
The expected value of a sum is equal to the sum of its component expected values. That is, E[Σgj(Cj)] =ΣE[gj(Cj)]. Cast in terms of the previous section, the sum function is linear, so its Jensen gap must be zero. Its additive structure enables us to use dynamic programming to find solutions to stochastic pro- blems when the objective function is a sum. A difficulty arises, however, in gen- eralizing dominance conditions from the deterministic case to the stochastic case. Ideally, the most convenient generalization would be to adopt the deter- ministic counterpart–that is, we would like to use E[pj] instead ofpjin the var- ious dominance conditions. However, this approach turns out to be unreliable.
∎Example 6.5 Consider the problem of sequencing two jobs with stochastic processing times and the objective of minimizing expected total tardiness.
Jobj 1 2
dj 2.9 3
E[pj] 1.9 2
The processing time distributions of the two jobs are distributed as follows.
State Jobj 1 2 Probability
A pj 1 2 0.9
B pj 10 2 0.1
If we replacepjby E[pj], the two jobs have agreeable parameters. In the deter- ministic counterpart, therefore, we apply condition (a) of Theorem 3.3 and sequence job 1 first. This yieldsT= 0 with probability 0.9 and 16.1 otherwise, so that E[T] = 1.61. But if we reverse the sequence,Tis 0.1 with probability 0.9 and 9.1 otherwise, so E[T] = 1. The example demonstrates the following.
Proposition 6.2 The stochasticT-problem and its deterministic counterpart may not be optimized by identical sequences, and dominance conditions that apply for the deterministic counterpart are not necessarily valid in the stochastic case.
6.6 Stochastic Dominance and Association 145
To summarize, we can use general combinatorial optimization methods to solve for the optimal sum of expected values, and the result will also minimize the expected value of the sum. However, because deterministic dominance rela- tionships may not apply, we should expect these methods to take longer in the stochastic case than in the deterministic case. For this reason, we would like to identify circumstances under which counterpart dominance rules would still hold.
When E[p1]≤E[p2], we say thatp1is (weakly) smaller thanp2by expectation.
We also writep1≤exp2. Example 6.5 demonstrates thatp1≤exp2is not sufficient to generalize deterministic dominance rules requiring p1 ≤p2, because the worst-case realization ofp1could be larger than that ofp2. However, stochastic ordering relationships exist that preclude a worst-case reversal. We say that one random variable,X, isstochastically smallerthan another,Y(denotedX≤stY), if Pr{X≤t}≥Pr{Y≤t} for anyt. This implies that the cdf ofX,FX(t), is at or above the cdf ofY,FY(t). That is,FX≥FYeverywhere. We also refer to this relationship asstochastic dominance, and if it applies to several pairs of random variables, we say that they arestochastically ordered(because the dominance relationship is transitive). Stochastic dominance is a strong relationship in the sense that≤st
implies ≤ex. A useful way to visualize this relationship is to recall that the expected value of a nonnegative random variable is given by the area above its cdf below 1 and to the right of the origin (see Figure 6.1). However, ifFX≥FY, then the area aboveFXcannot exceed the area aboveFY. Therefore, the expected value ofXcannot exceed the expected value ofY.
The definition of≤stdoes not require statistical independence. For example, letXandYbe two independent and identically distributed (iid) random vari- ables, and letZbe any nonnegative random variable (including the degenerate case, in whichZ= 0 with certainty). ThenX≤stY + ZandX≤stX+Z. The first relationship holds between independent random variables. WhenZ = 0 with certainty, we have that iid random variables Xand Yare each stochastically smaller than the other. But in the second relationship,XandX+Zare statis- tically dependent because of a common element shared by the two random vari- ables. When random variables are positively correlated as a result of common causes of variation affecting more than one of them in the same direction, they satisfy the definition of associated random variables. Random variables areasso- ciatedif the correlation between any positive nondecreasing functions of each is nonnegative. Independent random variables are associated, but negatively
FX E[X]
Figure 6.1 Depicting the expected value as an area above the cdf.
6 Sequencing for Stochastic Scheduling 146
correlated ones are not. Association may arise not only by adding the same random variable to two or more independent random variables but also by mul- tiplying two or more positive random variables by the same positive element.
We introduce associated processing times because in practical settings, common causes of variation often affect more than one job in the same direction. For example, if a regular worker is faster than the replacement and the regular worker will be sick tomorrow with some positive probability, then for scheduling purposes, a positive dependence is introduced among all of tomorrow’s processing times. As another example, if the quality of a particular tool deteriorates, then the jobs that require it may all take longer. In general, various causes are likely to introduce positive dependence among different subsets of jobs.
When processing times behave as associated random variables, the comple- tion time variance is higher than for independent random variables, for all but the first job. For independent random variables, the variance of a sum equals the sum of the variances. But, by definition, two associated random variables have a nonnegative covariance, and the variance of a sum with positive covariance is higher than the sum of the variances. So, in effect, the independence assumption is optimistic for the variance of a completion time. Finally, if two jobs have processing times that are associated, then their costs are also associated because the cost functions are nondecreasing. This relation, in turn, implies that the variance of performance measures based on processing times that are associated random variables is also higher than the variance for independent proces- sing times.
Two nonnegative random variables,XandY, arelinearlyassociated if there exist four independent nonnegative random variables,R,S,Z, andB, and two nonnegative parameters,αandβ, such thatX= (R+αZ)BandY= (S+βZ)B.
If we setα=β= 0 andB= 1 with certainty, thenX = R,Y = S, and they are inde- pendent by assumption (and thus associated). At the other extreme, ifRandS are 0 with certainty, thenXandYare proportional (and thus associated). Here,B models a multiplicative bias shared by X and Y, whereas Z represents any additive element they may share. In what follows, we assume linear association.
Furthermore, we treat the special caseα=β= 1. Less restrictive assumptions may suffice, but this one is simple to present yet still more realistic than the independence assumption.
∎Theorem 6.7 IfXandYare linearly associated, that is,X= (R+Z)BandY= (S+Z)BwhereR,S,Z, andBare independent nonnegative random variables, thenX≤stYif and only ifR≤stS, andX≤exYif and only ifR≤exS.
Theorem 6.7 allows us to generalize existing results based on statistical inde- pendence to the case of linearly associated random variables. For example, it can be shown that ifp1≤stp2, wherep1andp2are independent, then Pr{p1≤p2}≥0.5.
6.6 Stochastic Dominance and Association 147
We can extend that result to stochastically ordered, linearly associated random variables. Furthermore, if p1 ≤st p2, then E[(p1 −t)+]≤ E[(p2 −t)+].
To demonstrate this inequality, consider that E[(pj−t)+] is the area above the cdf of jobjand below 1 to the right oft(Figure 6.2). Because the cdf of the sto- chastically smaller random variable is above the other, the relevant area must be smaller. This argument, as stated, is correct forSandT, but it is inherited byXand Ythrough linear association. So, informally, it is a good bet to assume thatp1≤p2 in this case. However, Example 6.5 demonstrates that it is not necessarily a good bet when all we know is thatp1≤exp2. The relationship in that case was by expec- tation, but without stochastic dominance. Example 6.5 is predicated on the fact that the worst-case performance of p1was worse than the worst-case perfor- mance ofp2. But when the two processing times are stochastically ordered, such a worst-case reversal cannot happen.
∎Theorem 6.8 In theTw-problem, let jobs 1 and 2 satisfyp1≤stp2,d1≤d2, andw1≥w2, then job 1 precedes job 2 in an optimal sequence. Furthermore, if we subject the jobs to linear association, the result remains true.
Proof. In Figure 6.3 (which elaborates on Figure 6.2), the expected tardiness of a job is depicted as a tail to the right of its due date, above the distribution that applies to it and below the upper horizontal line of 1. The relevant distributions
F1
E[(p2–t)+] – E[(p1–t)+] F2
t E[(p1–t)+]
Figure 6.2 E[(p1−t)+] and E[(p2−t)+] as areas.
F1
F1+2
d1 d2
F2
Figure 6.3 Comparing two sequences with stochastic dominance.
6 Sequencing for Stochastic Scheduling 148
are eitherFkif jobkis scheduled first (k= 1, 2) orF1+2if jobkis scheduled sec- ond. These three distributions also reflect any preceding jobs that have already been scheduled or any jobs scheduled between jobs 1 and 2. As the figure shows, job 1 is stochastically smaller and has a lower due date, per the condition of the theorem. LetTF,ddenote the area of the tail above distributionF(whereF= 1, 2 or 1 + 2) to the right of due dated(whered= 1, 2).TF,dmeasures an expected tardiness; for instance,T1+2,1is the expected tardiness of job 1 if it is sequenced second and is thus subject to the completion time distributionF1+2. We start with the sequence 1-2, assuming the two jobs are adjacent. By an adjacent pair interchange, the tardiness cost of job 1 increases by w1(T1+2,1−T1,1)≥ w1(T1+2,2−T1,2), whereas the tardiness cost of job 2 decreases byw2(T1+2,2− T2,2)≤w2(T1+2,2 −T1,2). But because w2 ≤w1, w2(T1+2,2 −T1,2)≤w1(T1+2,2 − T1,2), so the gain is bounded from above by a lower bound of the loss and the change cannot decrease but may increase the total weighted tardiness.
Now allow additional jobs (which need not be stochastically ordered) between jobs 1 and 2. If we interchange the two jobs, all these intermediary jobs will follow a stochastically larger job so their expected tardiness cannot decrease.
Hence, such jobs cannot provide incentive to perform the interchange either. To show that linear association will not change the result, invoke
Theorem 6.7. □
Corollary 6.3 For linearly associated processing times that are stochastically ordered, if expected processing times and due dates are agreeable for all pairs of jobs, then the expected total tardiness E[T] is minimized by SEPT sequencing with ties broken by EDD (or, equivalently, by EDD with ties broken by SEPT).
Although Theorem 6.8 generalizes one dominance condition subject to a relatively strong assumption, even with this assumption in place, it remains difficult to generalize other deterministic dominance conditions. For example, generalizing Theorem 3.2 requires that a job will not be tardy with probability one. Hence, we are still left with the conclusion that the optimal solution to sto- chastic problems will always take significantly longer to find than the solution to their deterministic counterparts.