Learning Pseudo-Backdoors - Incorporating Existing Solvers as Sub-routines

Chapter V: Incorporating Existing Solvers as Sub-routines

5.7 Learning Pseudo-Backdoors

On a high level, our approach utilizes two learned models: a scoring model that scores subsets of integer variables according to how their likelihood of being pseudo- backdoors and a classifier that decides whether to use a predicted subset in the actual MIP solving. The intuition of including the second model is that some MIPs do not admit a small pseudo-backdoor in practice. For them, it is better to run a solver in its default setting.

Figure 5.3 illustrates our method. At test time, given a new MIP instance, we randomly sample subsets of integer variables according to their LP fractionality as in (Dilkina, Gomes, Malitsky, et al., 2009), scoring the sampled subsets, and taking the subset with the highest score as the predicted pseudo-backdoor. Then we use the classifier to decide whether to use the predicted pseudo-backdoor in a MIP solver

LP Relaxation MIP

Pseudo-Backdoor samples

ℬ_! ℬ_"

… ℬ_#

Scoring module (GAT + Attention Pooling)

Classification module Solve with ℬ^∗or gurobi?

(GAT + Attention Pooling)

ℬ^∗

ℬ^∗or Gurobi?

Solve with ℬ^∗

ℬ^∗ Solve with

Gurobi Score

Figure 5.3: The pseudo-backdoor deployment pipeline visualizes the different com- ponents used for solving a single MIP instance with the two learned models, the scoring module S(P,B;θS) and the classification module C(P,B;θC). First k pseudo-backdoor sets of decision variables B¹, . . . ,B^k are sampled according to the decision variables’ LP fractionality. These candidate pseudo-backdoor sets are ranked according to the scoring module S(P,B;θS) to predict the best pseudo- backdoorB^∗. The classification module then determines whether to run the solver usingB^∗or not based on the predicted pseudo-backdoor successC(P,B^∗;θC). would result in faster solve time than the default setting. If the answer is positive, we assign higher branching priorities to those integer variables than the rest; otherwise, we run the solver with its default setting.

Concretely, the score modelS(P,B;θS)is parametrized by neural network parameters θS which takes as input the MIP specification P, and a candidate subset B, then predicts a score that characterizes ifB is a good pseudo-backdoor. The clas- sifierC(P,B;θC)is parametrized by neural network parametersθC which takes as input the MIP specificationP, and a candidate subsetB, then predicts whether the prioritizingB in branching would produce a smaller runtime compared to running the solver.

Learning the Score Model

We train the score model S by learning to rank subsets of integer variables based on their quality as pseudo-backdoors. For a MIPP and two subsets of integer vari- ablesB¹,B² ofP, we compute score estimates s1 = S(P,B¹;θ), s2 =S(P,B;θ). Additionally, we compute a ranking labelywhich is−1ifB¹leads to a smaller runtime, and 1 otherwise. We then compute the marginal ranking loss (Tsochantaridis et al., 2005) as loss(s1, s2, y) = max(0,−y(s1−s2) +m)for a given margin value m. The ranking loss allows the model to focus on distinguishing between relative performance rather than accurately modeling the absolute performance.

We want to ensure thatSyields predictions that are invariant to changes that shouldn’t modify the solver behavior such as permutations of variable labels. As a result, we consider a bipartite graph representation of a MIP as in (Gasse et al., 2019). This representation has two sets of nodes, one for variables and the other for constraints.

There is one variable node for each decision variable and one constraint node for each constraint. Each variable node contains information such as the variable’s objective coefficient, and root LP status. Each constraint node contains information such as the right hand side constantbj, root LP dual variables, and sense (≤,≥or

=). We use the same set of features as in (Gasse et al., 2019). To form the bipartite graph, we add an edge between a variableiand a constraint j if variableiappears in constraintj, i.e.,Aij 6= 0in the constraint matrix. The coefficientAij is encoded as an edge attribute. To represent a candidate pseudo-backdoor setBand retain the permutation invariance afforded by the graph representation, we consider a binary encoding of B by including an additional feature for each decision variable node which takes a value of 1 if the variable is inBand 0 otherwise. Encoding the input (P,B)as a graph now allows us to leverage state-of-the-art techniques in making predictions on graphs which are amenable to variable input graph sizes and exhibit permutation invariance.

We leverage the Graph Attention Network (Veličković et al., 2018) where the nodes of the input graph are embedded by aggregating messages at a given node from its neighbors. Once we have several iterations of message passing along the edges of the bipartite graph, we aggregate all the node embeddingsxi using global attention pooling (Li et al., 2016) to yield a single feature vector representing the whole graph.

Performing these steps of obtaining node embeddings followed by aggregation across the entire MIP enables us to use the same network architecture for MIPs of various sizes. Now that we have a fixed-length representation of the MIP, we feed it into a feedforward neural network to produce a scalar output as the score.

Learning the Classifier Model

For a given MIP instance, it may be difficult to sample valid pseudo-backdoors. As a result, we learn a subsequent classifier to determine whether to use the candidate subset or simply use a MIP solver in its default setting. The classifier has the same architecture as the scoring model, taking as input the bipartite graph representation of the MIPP and a candidate subsetB, and outputting a scalar value. However, in this module instead of ranking we perform binary classification. Thus, the last layer score is fed through a sigmoid activation function to get an output in the range of

[0,1]and the final binary output is obtained at a threshold of0.5.

To generate training data, for every MIP instance in the training distribution, we compute the solve time for a solver with its default setting and for using the candidate pseudo-backdoor suggested by the previous score model. We then label the MIP instance and pseudo-backdoor according to whether the pseudo-backdoor results in a faster solve time. Finally, we compute a loss for this classification module as a binary cross-entropy loss between the model outputs and the labels.

Dalam dokumen Learning to Optimize: from Theory to Practice (Halaman 80-83)