Using a Modern Benders Framework
Since its creation in ‘62, the Benders Decomposition method (Benders, 1962) has generated a lot of research around how to improve it. Its simplicity attracted a lot of attention but it is famously hard to implement efficiently. In this post, I will introduce a Branch-and-Benders-Cut framework that I am developping and trying to make as general as possible, available here, using Intensity Modulated Radiation Therapy (Taşkın & Cevik, 2013) as an example.
Modernizing Benders
The principle of the Benders Decomposition (BD) is simple: take a Mixed Integer Program (MIP) and separate it into its continuous (sub-problem) and integer components (master problem). Because solving a MIP is $\mathcal{NP}$-hard, the bigger the problem the harder the solving process, in an exponential fashion. Benders alleviate this difficulty by relaxing the MIP into a smaller, less constrained problem. However, it then needs to solve one MIP and one LP repeatedly to obtain the optimum.
Basic algorithm
Benders Decomposition proceeds as follows:
- Project out all continuous variables and associated constraints from the master problem and replace them with an incumbent.
- Solve the master problem to obtain a tentative solution.
- Pass this solution as a parameter to the sub-problem and solve it.
- Use duality to obtain multiplier from the sub-problem and generate a constraint in the master problem, called a cut.
Repeat until the incumbent (added variable) in the master problem and the sub-problem have the same value. For a more complete explanation, refer to my previous post on this topic.
Yes master, yes
One of the main cause for slowness in this algorithm is that we need to solve a complete MIP, the master problem, at each iteration. Furthermore, at each iteration this MIP becomes harder to solve because we add a new constraint – or more in the case of multicut (Birge & Louveaux, 1988).
Although this is a simplified MIP, which is usually pretty easy, it is still cumbersome to solve it to optimality. To make the approach faster we need to rely on a simple observation: when solving a MIP, we will encounter a number of integer feasible solution which we will discard because they are not optimal.
A second observation is that any solution to the master problem enables us to generate valid Benders cuts (McDaniel & Devine, 1977).1 Thus, combining the two ideas leads to what is colloquially referred to as Modern Benders.
The new algorithm is:
- Solve the master problem using a Branch-and-Bound (B&B) framework.
- At each node (integer or fractional depending), compute a Benders cut using the sub-problem.
- Use the solution of the sub-problem as a new bound for branching.
The main quality of this approach is that it only solves the master problem once. The downside is that adding cuts may make the search harder, so it is a balancing exercise to determine which to add.2
Using a modern solver, we have access to callbacks during the B&B at integer and/or fractional nodes. From there, we can generate lazy constraints which the solver will add to its cut pool and check when necessary. (Cf. Cuts
in benders/master.py
which is a callback class called at integer nodes.)
Modern Benders framework
In this section, I will introduce the Branch-and-Benders-Cut framework, called BranDec, that I am currently developing. I will use radiation therapy as an example. I will try to provide both the mathematical and Python description as a walkthrough on how to use it.
My goal with this framework is to make clear the separation between the Benders master problem, which is the core of the algorithm, and the specifics of the problem to solve. The usual approach to using Benders is integrated: the master problem’s variables, cut generation, and sometimes sub-problem are all aggregated into one massive function.
From my experience this is not necessary and makes every Benders implementation single-purpose, which defeats the idea of using a generic algorithm. My framework thus only comprises a single part: the master problem.
Of course, this also deals with the branching and cut generation – as in adding the cuts to the master at given nodes – but does not deal with modelling, getting dual values, etc. The way to use it is the following:
- Create an instance of sub-problem deriving
SubPb
which will have to implement a few methods: optimise given a master solution, get dual values from a solution. - Add variables and constraints to the master problem.
- Solve.
To run the code you will need Python 3.5 with CPLEX and numpy installed.
Case study
Intensity Modulated Radiation Therapy (IMRT) can be decomposed into the problem of matching a set of apertures to a given fluence map. A fluence map define the amount of radiation that has to be delivered to each area (cancer cell) and can be represented as an integer matrix. In the same fashion, we can represent the irradiation delivered by the machine as a set of apertures.
For example, consider the following fluence map, it can be decomposed into a set of apertures.
\[ \begin{bmatrix} 4 & 0 & 2 \\ 3 & 9 & 5 \end{bmatrix} = \\ 4 \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix} + 2 \begin{bmatrix} 0 & 0 & 1 \\ 0 & 0 & 1 \end{bmatrix} + 3 \begin{bmatrix} 0 & 0 & 0 \\ 1 & 1 & 1 \end{bmatrix} + 6 \begin{bmatrix} 0 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix}\]
To provide an efficient treatment, we want to minimize the amount of radiation received by the patient. This problem seems very amenable to BD: the master problem will choose a set of apertures to use while the sub-problem will decide how long they need to be used.
We can model the problem as follows:
- $T$ the target fluence map, a matrix with $n$ rows and $m$ columns.
- $R$ the set of all apertures with:
- we use $R(i,j)$ to denote the set of apertures that cover bixel $(i, j)$; and
- $M_r$ the minimum required intensity among the bixels covered by aperture $r$.
- $b_{ij}$ the amount of radiation needed in each bixel.
- $w$ the setup time per aperture.
- $y_r$ a binary variable representing whether aperture $r$ is used or not.
- $x_r$ a continuous variable representing how long each aperture will be used.
\[\begin{align*} \min && \sum_{r \in R} (w \cdot y_r + x_r) \tag{MIP} \label{eq:mip} \\ s.t. && \sum_{r \in R(i,j)} x_r & = b_{ij} & \forall i, j \in T \\ && x_r & \leq M_r \cdot y_r & \forall r \in R \\ && x \geq 0, y \in {\mathbb{Y}} \end{align*}\]
Master problem
From \eqref{eq:mip} we can derive the master problem by removing all continuous $x$ variables and replacing them with an incumbent $q$.
\[\begin{align*} \min && \sum_{r \in R} y_r + q \tag{Master} \label{eq:master} \\ s.t. && \text{Benders cuts} \\ && q \geq 0, y \in {\mathbb{Y}} \end{align*}\]
Hence $q$ is a lower estimator of the irradiation time required.
Code
bd = Benders()
bd.setMinimise()
bd.addVars([intensity] * len(apertures),
names=["y_{}".format(i) for i in range(len(apertures))])
# No constraints in the base master
# Add the subproblem
bd.addSub({0: SPTT(apertures, radiation)})
The dictionary at the end is used for multicut schemes, here we have a single cut, thus a single entry.
Sub-problem
The sub-problem then becomes: given a set of apertures, find if we can cover the fluence map.
\[\begin{align*} q(\bar{y}) = && \min \sum_{r \in R} x_r \tag{Sub} \label{eq:sub} \\ s.t. && \sum_{r \in R(i,j)} x_r & = b_{ij} & \forall i, j \in T & \tag{$\alpha$} \\ && x_r & \leq M_r \cdot \bar{y}_r & \forall r \in R & \tag{$\beta$} \\ && x \geq 0 \nonumber \end{align*}\]
Given the dual multipliers $\alpha$ and $\beta$ associated to each constraint in the sub-problem, we can write a Benders cut as follows:
Feasibility cut: A feasibility cut only removes the current solution from the master problem and is generated when the sub-problem is infeasible given the current solution.
\[ \sum_{i = 0}^{n} \sum_{j = 0}^m b_{ij} \cdot \hat{\alpha}_{ij} + \sum_{r \in R} (M_r \cdot \hat{\beta}_r) y_r \leq 0 \]
Optimality cut : An optimality gives us further information on the quality of the solution and involves the incumbent to provide a tighter constraint.
\[ \sum_{i = 0}^{n} \sum_{j = 0}^m b_{ij} \cdot \hat{\alpha}_{ij} + \sum_{r \in R} (M_r \cdot \hat{\beta}_r) y_r \leq q \]
One important thing to notice is the occurrence of the master variables alongside the dual values that correspond to constraints where their parametric version ($\bar{y}$) is used.
Therefore, a Benders cut can be expressed using only three items from the sub-problem:
- The dual values of constraints where the fixed master variables occur, which will form the left-hand side of the cut (in standard form).
- The dual values of the constraint that only exist in the sub-problem, their sum will form the right-hand side of the cut.
- Whether it is an optimality or feasibility cut so the master knows whether to include the incumbent in the left-hand side or not.
Generating cuts
The code for setting up the sub-problem is mainly related to using CPLEX, I will let the reader peruse through it as there are no surprises there. However, I will detail quickly the functions used to “generate” the cuts.
def optiCut(self):
duals = self.cpx.solution.get_dual_values(self.vs)
rhs = self.cpx.solution.get_dual_values(self.us)
return duals, rhs
The optimality cut is straightforward: get the dual values of selected constraints.
def feasCut(self):
# Use a Farkas certificate as we have the primal and not the dual; the
# second parameter is the dual objective, should be equal to the $RHS *
# coefs$.
ray, _ = self.cpx.solution.advanced.dual_farkas()
duals = [ray[i] for i in range(len(self.us), len(ray))]
rhs = [ray[i] for i in range(0, len(self.us))]
return duals, rhs
Feasibility cuts are a tad more complex as we have to rely on a Farkas certificate (Farkas, 1902) to identify an extreme ray3 in the dual polyhedron. Luckily for us, CPLEX provides us with a built-in function to do the job. The only thing we have to do by hand is differentiate the different dual values based on their ids.
def results(self):
# Code elided...
lhs = [a * b for (a, b) in zip(duals, self.rhs)]
mult = self.cpx.linear_constraints.get_rhs(self.us)
return lhs, solValue(rhs, mult)
Finally, one we have retrieved the dual values, we can generate the cut. Turning dual coefficients into a Benders cut consists in multiplying the dual values by the RHS of their associated constraint. This is because, using linear duality theory, the RHS vector becomes the objective coefficients of the dual.
Solving the problem
A few things before starting:
- We will use the primal problem to exemplify the process, using the dual problem would lead to the exact same solution.
- Because of the nature of linear programming, or rather the solver used, we cannot specify which value we want when the problem is degenerate (multiple equivalent solutions). However, this can only influence the number of iterations of the algorithm, the result will always be optimal.
We will use the following numerical example to illustrate a run of the algorithm:
- Exposure time: $w = 7$, will be called
intensity
. - Fluence map, called
radiation
.
\[ T = \begin{bmatrix} 8 & 3 \\ 5 & 0 \end{bmatrix} \]
- A set of five apertures, called
apertures
.
\[ R = \left\{\begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix},\begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix},\begin{bmatrix} 0 & 0 \\ 1 & 0 \end{bmatrix},\begin{bmatrix} 1 & 0 \\ 1 & 0 \end{bmatrix},\begin{bmatrix} 1 & 1 \\ 0 & 0 \end{bmatrix}\right\} \]
apertures = [[[1, 0], [0, 0]], [[0, 1], [0, 0]], [[0, 0], [1, 0]],
[[1, 0], [1, 0]], [[1, 1], [0, 0]]]
radiation = [[8, 3], [5, 0]]
intensity = 7
Initialisation
First, let us express the sub-problem with the given values. We will not have any constraint corresponding to $\alpha_{22}$ as the target bixel is nil.
\[ \begin{align*} q(\bar{y}) = \min && x_1 + x_2 + x_3 + x_4 + x_5 \tag{SPTT} \\ s.t. && x_1 + x_4 + x_5 &= 8 \tag{$\alpha_{11}$}\\ && x_2 + x_5 &= 3 \tag{$\alpha_{12}$}\\ && x_3 + x_4 &= 5 \tag{$\alpha_{21}$}\\ && x_1 \leq 8 \cdot \bar{y}_1, x_2 \leq 3 \cdot \bar{y}_2, \\ && x_3 \leq 3 \cdot \bar{y}_3, x_4 \leq 5 \cdot \bar{y}_4, x_5 & \leq 3 \cdot \bar{y}_5 \tag{$\beta_{1-5}$}\\ && x \geq 0 \end{align*} \]
I will provide sample output from the code as well, mainly debug
level information.
Iteration 1
We begin with the following relaxed master problem.
\[ \begin{align*} \min && 7 \times (y_1 + y_2 + y_3 + y_4 + y_5) + q \label{eq:it_1} \tag{Pb.1} \\ s.t. && y \in \mathbb{B}, q \geq 0 \end{align*}\]
The optimal solution to \eqref{eq:it_1} is $\bar{y} = [0, 0, 0, 0, 0], \bar{q} = 0$ which leads to an infeasible sub-problem. From its Farkas certificate, we get the following values:
\[ \begin{matrix} \alpha_{11} = 1, \alpha_{12} = 0, \alpha_{21} = 0 \\ \beta_{1} = -1, \beta_{2} = 0, \beta_{3} = 0, \beta_{4} = -1, \beta_{5} = -1 \end{matrix} \]
Sol: 00000 - q: 0.000000
Feasibility Cut
Beta: [-1.0, 0.0, 0.0, -1.0, -1.0], Alpha: [1.0, 0.0, 0.0]
LHS: [-8.0, 0.0, 0.0, -5.0, -3.0], RHS: 8.0
Iteration 2
With the previous results we obtain the new master problem.
\[ \begin{align*} \min && 7 \times (y_1 + y_2 + y_3 + y_4 + y_5) + q \label{eq:it_2} \tag{Pb.2} \\ s.t. && 8 - 8 y_1 - 5 y_4 - 3 y_5 & \leq 0 \\ && y \in \mathbb{B}, q \geq 0 \end{align*}\]
We can easily see that the first member of the constraint is our $\alpha_{11} \times b_{11}$ and that the multiplier for each $y_r$ is: $\beta_r \times M_r$.
The optimal solution is now: $\bar{y} = [1, 0, 0, 0, 0], \bar{q} = 0$. The sub-problem is still infeasible so we will generate a feasibility cut with the following dual values:
\[ \begin{matrix} \alpha_{11} = 0, \alpha_{12} = 0, \alpha_{21} = 1 \\ \beta_{1} = 0, \beta_{2} = 0, \beta_{3} = -1, \beta_{4} = -1, \beta_{5} = 0 \end{matrix} \]
Sol: 10000 - q: 0.000000
Feasibility Cut
Beta: [0.0, 0.0, -1.0, -1.0, 0.0], Alpha: [0.0, 0.0, 1.0]
LHS: [0.0, 0.0, -5.0, -5.0, 0.0], RHS: 5.0
Iteration 3
We now have the new master problem with two cuts.
\[ \begin{align*} \min && 7 \times (y_1 + y_2 + y_3 + y_4 + y_5) + q \label{eq:it_3} \tag{Pb.3} \\ s.t. && 8 - 8 y_1 - 5 y_4 - 3 y_5 & \leq 0 \\ && 5 - 5 y_3 - 5 y_4 & \leq 0 \\ && y \in \mathbb{B}, q \geq 0 \end{align*}\]
The optimal solution becomes: $\bar{y} = [1, 0, 0, 1, 0], \bar{q} = 0$, which again leads to an infeasible sub-problem with the following dual values:
\[ \begin{matrix} \alpha_{11} = 0, \alpha_{12} = 1, \alpha_{21} = 0 \\ \beta_{1} = 0, \beta_{2} = -1, \beta_{3} = 0, \beta_{4} = 0, \beta_{5} = -1 \end{matrix} \]
Sol: 10010 - q: 0.000000
Feasibility Cut
Beta: [0.0, -1.0, 0.0, 0.0, -1.0], Alpha: [0.0, 1.0, 0.0]
LHS: [0.0, -3.0, 0.0, 0.0, -3.0], RHS: 3.0
Iteration 4
We add a third feasibility cut to the master.
\[ \begin{align*} \min && 7 \times (y_1 + y_2 + y_3 + y_4 + y_5) + q \label{eq:it_4} \tag{Pb.4} \\ s.t. && 8 - 8 y_1 - 5 y_4 - 3 y_5 & \leq 0 \\ && 5 - 5 y_3 - 5 y_4 & \leq 0 \\ && 3 - 3 y_2 - 3 y_5 & \leq 0 \\ && y \in \mathbb{B}, q \geq 0 \nonumber \end{align*}\]
Optimal solution: $\bar{y} = [0, 0, 0, 1, 1], \bar{q} = 0$. Guess what? Yes, feasible sub-problem. In this case, the sub-problem is mainly checking the feasibility of increasingly expensive solutions because the lower bound of the master problem is monotonically increasing, the first feasible solution thus yield the optimum.
However this property is not enough to conclude as it does not hold in the general case. Thus we add an optimality cut to the master problem and do one more iteration using the following dual values:
\[ \begin{matrix} \alpha_{11} = 1, \alpha_{12} = 1, \alpha_{21} = 1 \\ \beta_{1} = 0, \beta_{2} = 0, \beta_{3} = 0, \beta_{4} = -1, \beta_{5} = -1 \end{matrix} \]
Sol: 00011 - q: 0.000000
Optimality Cut with x = [0, 0, 0, 5, 3]
Beta: [0.0, 0.0, 0.0, -1.0, -1.0], Alpha: [1.0, 1.0, 1.0]
LHS: [0.0, 0.0, 0.0, -5.0, -3.0], RHS: 16.0
Iteration 5
We now have the first optimality cut in the master.
\[ \begin{align*} \min && 7 \times (y_1 + y_2 + y_3 + y_4 + y_5) + q \label{eq:it_5} \tag{Pb.5} \\ s.t. && 8 - 8 y_1 - 5 y_4 - 3 y_5 & \leq 0 \\ && 5 - 5 y_3 - 5 y_4 & \leq 0 \\ && 3 - 3 y_2 - 3 y_5 & \leq 0 \\ && 16 - 5 y_4 - 3 y_5 & \leq q \\ && y \in \mathbb{B}, q \geq 0 \end{align*} \]
Notice $q$ appearing in the RHS of the optimality cut.4
The solution to \eqref{eq:it_5} is: $\bar{y} = [0, 0, 0, 1, 1], \bar{q} = 8$. We have $\bar{q}$ equal to the value of the sub-problem with solution: $\bar{x} = [0, 0, 0, 5, 3]$, thus we found the optimal solution to the problem.
Sample output
Added 5 master variables.
Registered 1 sub-problems bundled in 1 cuts.
Itrs | Master | q | q(z) | Opt | UB
---- + ---------- + ---------- + ---------- + --- + ----------
1 | 0.000 | 0.000 | 0.000 | | 0.000
2 | 7.000 | 0.000 | 0.000 | | 0.000
3 | 14.000 | 0.000 | 0.000 | | 0.000
4 | 14.000 | 0.000 | 8.000 | X | 0.000
Found q(z) = q.
Solution 00011 = 22.0
Stop after 0 nodes (5 integer).
Conclusion
I hope this post served its purposes: help make clear the separation between Benders’s algorithm and the models we use, give a numerical example of using Benders, and introduce a Python framework for using Benders in a modern way.
As an exercise, you can try implementing the first example matrix.
Feel free to poke me if you have any comment, remark, or correction.
Edits
- July 2020: Added code for generating the cuts by using the RHS of constraints.
References
- Benders, J. F. (1962). Partitioning procedures for solving mixed-variables programming problems. Numerische Mathematik, 4(1), 238–252.
- Taşkın, Z. C., & Cevik, M. (2013). Combinatorial Benders cuts for decomposing IMRT fluence maps using rectangular apertures. Computers & Operations Research, 40(9), 2178–2186.
- Birge, J. R., & Louveaux, F. V. (1988). A multicut algorithm for two-stage stochastic linear programs. European Journal of Operational Research, 34(3), 384–392.
- McDaniel, D., & Devine, M. (1977). A modified Benders’ partitioning algorithm for mixed integer programming. Management Science, 24(3), 312–319.
- Farkas, J. (1902). Theorie der einfachen Ungleichungen. Journal Für Die Reine Und Angewandte Mathematik (Crelle’s Journal), 124, 1–27.
Footnotes
“Any” as in even fractional solutions of the master, especially useful at the root node. ↩
It is an actual downside as cuts generated with the optimal solution to the master problem may be stronger, but in general it seems that adding Benders cuts during the B&B is better. ↩
An extreme ray is a direction of unlimited increase, it occurs when the primal is infeasible – dual unbounded. ↩
Technically, $q$ belongs to the LHS because, in normal form, all variables are on the LHS. ↩