4 Cut-and-Branch Algorithm for Cell Suppression

Branch-and-Cut versus Cut-and-Branch Algorithms for Cell Suppression 35

36 J.-J. Salazar-Gonz´alez

Another alternative way of eliminating the infeasible solution x^∗ arises by observing that any subset ofS^∗ cannot be a protected solution. In other words, any protected solution contains at least a cell inI\S^∗. This constraint can be mathematically written by the inequality:

i∈I\S^∗

xi≥1. (8)

The family of constraints (8) for allS^∗unprotected is suﬃcient to guarantee the exactness of the algorithm when wi ≥0. Indeed, each constraint eliminates all infeasible patterns which are subset of un unprotected patternS^∗.

Inequalities similar to (8) appear when solving combinatorial optimization problems from many other applications. They are calledcover inequalities; see e.g. Wolsey [5]. A cover inequality is stronger when the subsetI\S^∗is minimal, i.e., when adding a cell toS^∗ creates a larger set which is a protected pattern.

The process of reducing some coeﬃcients in the left-hand-side of inequality (8) is calledlifting procedure and requires solving the linear programs (3) and (4).

An advantage of using (8) is the simplicity of the separation procedure. It does not need optimal dual solutions of the linear programs (3) and/or (4).

This advantage is further exploited in Section 6. A disadvantage, when the cover inequalities are compared to the capacity constraints, is that the cover inequality (8) is not valid whenx^∗ is a non-integer vector. Another disadvantage is that only one inequality (8) is associated with a non-protected integer vectorx^∗. This disadvantage, however, may be avoid if one uses a lifting procedure since using different sequences to lift the coefficients may induce different lifted inequalities.

Still the computational effort of lifting could drastically decrease the effectiveness of the separation procedure. Preliminary computational results confirmed this claim. Even more, also applying both the separation procedures of (7) and (8) did not improved in practice over the cut-and-branch algorithm with only (7).

For that reason, in Section 5 we use (7) and not (8). Still, Section 6 points out a situation where constraints (8) are relevant.

5 Computational Results

We have implemented the two approaches introduced in Sections 3 and 4: branch- and-cut and cut-and-branch. The implementation was done on a computer Dell Precision T5400 with Intel Xeon X5460 3.16GHz, and using JAVA programming language and Eclipse [6], a free and open source software (FOSS) for JAVA de- velopments. To solve mathematical programming models we have used GLPK 4.43 [7], which is also FOSS. These options have been selected for the immedi- ate portability of the implementations to diﬀerent computer platforms and for ensuring the FOSS feature of the ﬁnal code. These are well-appreciated features in software to protect tables for data stewardship organizations.

It is known that GLPK is far from being competitive in eﬃciency with other mathematical programming solvers. However, the scope of this section is not to show the performance of the fastest implementation of each algorithm. The

Branch-and-Cut versus Cut-and-Branch Algorithms for Cell Suppression 37 scope of this section is to compare performance of diﬀerent implementations done under the same conditions. We believe that, by properly scaling the computational times of each run, one can extract similar conclusions if another computer, programming language and mathematical programming solver were used.

The initial heuristic procedure (Step 1) is the one proposed in ([2]), used as starting feasible CSP solution for both the branch-and-cut and the cut-and- branch algorithms. We did not implemented any ad-hoc primal heuristic procedure (Step 5), but we activated the feasibility pump procedure available in GLPK. Note that this procedure is a primal heuristic procedure for the branch- and-cut implementation but not for the cut-and-branch implementation. Indeed, when our branch-and-cut implementation ends with the time limit, the only feasible CSP solution is the one generated by the initial heuristic.

To compare our implementations we are using a collection of real-world tables created by the Incoming Tax Department of the Spanish Ministry of Finance (”Agencia Tributaria, Ministerio de Econom´ıa y Hacienda”). This collection contains 157 tables extracted from the 2008 IRPF taxes. The number of cells is between 138 and 570, and the number of equations is between 42 and 310.

Protecting these real-world tables was the original motivation of the research con- tained in this paper. Unfortunately, for conﬁdentiality issues, this collection of instances is not publicly available. There is another collection of instances which is publicly available through the websitehttp://webpages.ull.es/users/casc.

However, our implementations based on a FOSS mathematical programming tool were unable to deal with most of these instances due to the larger number of cells and equations. For that reason the analysis in this paper is based on running our implementations only on the collection motivating this research.

Over the 157 tables, there are 9 tables without primary suppressions, thus they do not have an associated CSP. Among the remaining 148 tables, there are 15 tables that could not be solved to optimality by the branch-and-cut implementation within a time limit of 1 hour. Table 1 gives the average (av) and the standard deviation (sd) over the 133 tables solved to optimality by the branch- and-cut implementation:

|I|: Numbernof cells in the table.

|J|: Numbermof equations in the table.

|P|: Number of sensitive cells (i.e., the number of primary suppressions).

z: Objective value of the initial heuristic CSP solution (Step 1).

time’: Number of seconds to compute the initial heuristic CSP solution.

sep: Number of solutions x^∗ that have been checked, and potentially some capacity constraints (7) have been generated (Step 7).

cuts: Number of violated capacity constraints (7) generated.

nodes: Number of calls to the branching procedure (Step 10).

time: Number of seconds required by the branch-and-cut implementation.

z⁰: Optimal objective value of the ﬁrst linear program solved (Step 2).

z1: Optimal objective value of the last linear program solved before branching (i.e., the lower bound at the root node).

opt: Objective value of the optimal CSP solution.

38 J.-J. Salazar-Gonz´alez

Table 1.Average (av) and standard deviation (sd) on 133 tables

instance heuristic branch-and-cut

|I| |J| |P| z time’ sep cuts nodes time z0 z1 opt av 485.3 258.3 73.5 25032.7 6.0 2539.8 5428.6 431.8 179.2 13316.0 22361.6 22431.0 sd 86.7 54.1 63.2 72956.1 5.6 8880.5 19027.7 1299.8 566.2 54718.1 71187.9 71191.6

The branch-and-cut implementation was able to solve 107 instances without branching. This means that in most of the cases, the external loop was not necessary to achieve integrability. Over the 133 tables solved to optimality, the gap before branching is 0.1%. Over the 148 tables with primary suppressions, the gap before branching is 0.5%. These percentages show a good quality of the lower bound z^∗ from the linear program before branching, which is a fundamental feature to the success of a branch-and-cut implementation. Of course, it also shows good quality of the upper boundz provided by the initial heuristic (Step 1). Over the 15 tables where the branch-and-cut implementation was not able to conclude optimality within 1 hour, the gap before branching is 6.5%. The quality of this lower bound is mainly due to the capacity constraints (7), but it also due to the activation of the additional inequalities that are automatically generated inside GLPK. These are mainly Gomory inequalities, and have contributed to close the gap in 5% on average.

Over the 133 tables solved to optimality, the average time to protect a table is 2 minutes, and the worse case is 10 minutes. This is a very satisfactory behavior of an exact approach to solve CSP for a data stewardship organization desiring a free and open source implementation to protect a table. There are 15 tables which could not be solved by this implementation within 1 hour, but even in these few cases the quality of the best feasible CSP solution was quite satisfactory.

Over the 148 tables, 117 CSP instances are solved by cut-and-branch to optimality before the time limit of 1 hour. The cut-and-branch implementation was also faster than the branch-and-cut implementation on 62 tables. This is a very relevant observation. Even more, there are 2 instances solved to optimality by the cut-and-branch implementation and not solved by the branch-and-cut implementation. Details on these two instances are given in Table 2. The meaning of the column coincides with the given for Table 1, except that now they repre- sent individual values and not average values. In addition we ﬁnd the following columns:

z”: Objective value of the best feasible CSP solution when the branch-and-cut ended with the time limit.

iter: Number of steps of the external loop by the cut-and-branch implementation, i.e., the number of integer programs solved.

cuts’: Number of capacity constraints (7) generated by the cut-and-branch approach.

time”: Number of seconds required by the cut-and-branch implementation.

Branch-and-Cut versus Cut-and-Branch Algorithms for Cell Suppression 39

Table 2.Details on two tables

instance heuristic branch-and-cut cut-and-branch

id |I| |J| |P| z time’ sep cuts nodes time z1 z” iter cuts’ time” opt 738 425 232 74 5770 7 17831 2874 6452 3600 5157.6 5634 92 1245 493 5597 746 553 299 40 3517 4 42896 120228 10028 3600 2987.8 3431 38 682 134 3266

We are reporting results where the cut-and-branch implementation uses capacity constraints and not cover inequalities. The reason for that is because, when using both families of inequalities, in 3 instances the implementation reduced the total time in more than 10 seconds and in 22 instances the implementation increased the total time in more than 10 seconds. On average the implementation saved 34 seconds by only using capacity constraints (7)). For that reason (and because in our collection the response variable may assume any continuous value) we deactivated (8) in our cut-and-branch implementation.

Over the 148 tables, both the branch-and-cut and the cut-and-branch implementations solved to optimality 105 instances. The branch-and-cut was faster than the cut-and-branch on 22 instances and the time reduction was 64%. The cut-and-branch was faster than the branch-and-cut on 59 instances and the time reduction was 69%. For the remaining 24 instances the time diﬀerence was smaller than one second.

Over the 117 instances solved to optimality by the cut-and-branch implementation, the average number of iterations of the external loop was 38.9, the average number of capacity cuts generated in the whole approach was 293, and the average number of seconds to end with optimality proof was 121 seconds.

Over the 31 instances not solved by the cut-and-branch implementation, the average number of iterations of the external loop was 273.3, the average number of capacity cuts generated in the whole approach was 1414.2, and the average gap between the initial CSP solution and the last unprotected solution was 14%.

Dalam dokumen Lecture Notes in Computer Science (Halaman 45-49)