Yet another optimisation blogResearch Fellow, Monash University
http://arthur.maheo.net/
Tue, 03 Dec 2019 05:08:36 +0000Tue, 03 Dec 2019 05:08:36 +0000Jekyll v4.0.0Things to avoid in presentations<p>Scientific presentation is an important part of doing research, but it is hard to captivate an audience. Conferences are great venues to disseminate ideas and get yourself known, so you have to seize your listeners’ attention. I list here ideas and observations I made during the last conference I attended to try to shed some light on common <em>issues</em>. Hopefully, someone can benefit from it; at least it helps me reflect on my own practice.</p>
<h1 id="point-of-this-post">Point of this post</h1>
<p>Through the years I have moved away from the “standard” Beamer template that you see much too often in conferences. I do not think that a presentation should be a translation of your paper into slides instead of pages. The goal of your presentation is to communicate the <em>idea</em> of the paper. I find myself too often sitting through talks on topics I am interested in, but incapable to engage. On the other hand, I know some amazing presenters who are able to teach me about their work, no matter how remote from my own.</p>
<p>As usual, this is going to be my personal opinion and in no way do you have to follow it. My slides are usually described as “clean” or “barren” depending on the commentator’s opinion. But I prefer it this way. I use slides as illustrations to anchor my narrative; you should not be describing your slides.</p>
<p>I will be talking about the pitfalls in slide-decks I see most often in talks I attend and try to highlight them against what I see in good talks. I am not talking about language fluency, voice projection, body language or awkwardness – I come from computer science after all.</p>
<h1 id="things-to-avoid">Things to avoid</h1>
<ul>
<li><strong>Large result tables:</strong> Highlight the key findings as either lists or small graphs; if you do use a table, always highlight the main results.</li>
<li><strong>No table of content:</strong> Present your talk’s layout orally and use segue slides. My pet peeve I would say, every talk where the speaker starts by reading her table of contents, I know will not be good. If you must have an outline slide, put it after the context.</li>
<li><strong>Stick to time:</strong> If your presentation is too short, no one will mind; you will annoy participants otherwise. Oh, and if the chair signals that your talk is over, it is. Most chairs are too polite to physically interrupt the speaker – they should.</li>
<li><strong>Remove headers and footers from slides:</strong> You need to acknowledge your affiliations, project, university of residence and the like, but keep it to the title and thank you slide, perhaps segue slides as well. In a nutshell, reduce clutter. Your data, bullet points, should have as much screen space as possible. This includes number of slides, outline, slide title banners, logos, etc.</li>
</ul>
<h2 id="mathematical-expressions">Mathematical expressions</h2>
<p>I think mathematical expressions, equations in particular, are a point worth improving. I work in operations research, mathematical modelling is at the core of my work, and yet I have removed every equation from my slides. I usually will explain them in a bullet list with a textual description. The paper is there to provide the actual maths.</p>
<p>In some cases, with a specific audience you can pull it off with almost black-and-white equations. Recently, I have seen people use graphical explanations of their equations as support. As always, this is a non-negligible amount of work but it truly helps.</p>
<p>Let us see an example. The first one is a <em>standard</em> beamer presentation with the equations pulled from the paper.</p>
<p><img src="/assets/images/presentations/beamer.png" alt="img" title="Beamer example" /></p>
<p>This slide is bloated and at the same time shallow: there is little information of value for the listener, yet it is hard to read.</p>
<p>I started using HTML based templates early on as I always found that LaTeX, although excellent for typesetting, did not do the job for visual content presentation.</p>
<p><img src="/assets/images/presentations/io.png" alt="img" title="Google IO example" /></p>
<p>This slide is much more information-dense, but it fails to draw the attention to key elements – in this case I did not find them that important for the talk.</p>
<p>Nowadays, this is how I would write a slide about a constraint of note.</p>
<p><img src="/assets/images/presentations/reveal.png" alt="img" title="Reveal.js example" /></p>
<p>This was done for this post so I am not sure I would use this very slide – and the images are not pretty. But I hope you get the gist: supporting text, clear pictorial representation of the concept, content uses the whole area.</p>
<h1 id="things-to-do">Things to do</h1>
<ul>
<li><strong>Having a small example:</strong> What better way to convey your ideas than have a small example? It allows you to go through the key points of your research without having to deal with the hairy details, which are then relegated to questions.</li>
<li><strong>Visual explanation:</strong> If you talk about a concept which can be interpreted graphically, use an illustration or a diagram.</li>
<li><strong>Mention key difficulties:</strong> Instead of having one (heavy) slide per difficult thing you did, give an overview of the work and mention what the hard bits were. It allows you to lighten the slides and makes room for questions.</li>
</ul>
<p>All in all, doing the <em>good things</em> will end up being more time consuming. However, a piece of advice I received from an outstanding researcher was:</p>
<blockquote>
<p>For every bad talk you give, you must give at least nine good ones to make up for it.</p>
</blockquote>
<h1 id="in-parting">In parting</h1>
<p>Research talks are key to research. I just attended a high-profile conference, and I could see all the professors at the top of the field being 100% engaged in the talks. This is where they get ideas, they scout people, they keep up with the field. Scientific communication is becoming more competitive every year, and, especially in a field such as mine, it is often downplayed. Or rather, it takes time to actually fathoms how important it truly is.</p>
<p>If you think this does not apply to you as your field is different enough, or you are a good presenter, etc. Please ignore this. But I still suggest you go back to some of your recent slides, have a hard look at them and ask yourself if there is a more intuitive, clearer, or different way you can communicate your point while avoiding one of the pitfalls I listed.</p>
<p>Making presentations is a task few people I know genuinely enjoy. It is a shame as I learn a lot during good presentations. But, making a good one requires quite a bit of work. In parting, I hope you either learned something or reflected on your practice and leave you with a few pieces of advice to make things faster:</p>
<ol>
<li>Having a (reproducible) template will make your preparation faster.</li>
<li>Make your figures for the presentation instead of reusing those from your paper. (I use Org Mode’s <code class="highlighter-rouge">src</code> blocks.<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup>)</li>
</ol>
<h1 id="footnotes">Footnotes</h1>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>My current setup for making slides consists in writing the slides in <a href="http://orgmode.org/">Org Mode</a> using <a href="https://www.gnu.org/software/emacs/">Emacs</a> (surprise, just like this post) and then exporting them to <a href="https://github.com/hakimel/reveal.js/">Reveal.js</a> (there is obviously an <a href="https://melpa.org/#/ox-reveal">exporter</a> for this). <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
Tue, 18 Jun 2019 00:00:00 +0000
http://arthur.maheo.net/things-to-avoid-in-presentations/
http://arthur.maheo.net/things-to-avoid-in-presentations/researchThe APTAS for Bin-Packing<p>The Bin Packing problem (BP) is a classic optimisation problem. It has a lot of similarities with the Knapsack problem as the goal is to pack items into containers. However, the BP’s objective is to minimise the number of bins to open given a set of items to allocate.</p>
<h1 id="what-is-it">What Is It?</h1>
<p>First, I will present the mixed-integer programming formulation of the problem, the first line is the objective: minimise the number of opened bins; the first constraint means that every item needs to be allocated to a bin; the second constraint means that a bin cannot accept more than a give size.</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align} \max && \sum_{i \in N} y_i \tag{BP} \\ s.t. && \sum_{i \in N} x_{ij} & = 1 & \forall j \in M \\ && \sum_{j \in M} a_j \cdot x_{ij} & \leq V \cdot y_i & \forall i \in N \\ && x, y \in \mathbb{B} \nonumber \end{align} %]]></script>
<p>It was the subject of heuristic research for long years because it admits some simple algorithms to solve it.</p>
<h2 id="first-fit">First-Fit</h2>
<p>The <em>first-fit</em> was the first heuristic found for BP, it simply consists in allocating an item in the first bin available. It runs in linear time based on the number of items, and has for property to never be worse than twice the optimal solution. Thus we call it an <em>Approximation Algorithm</em> (AA) with a ratio of: 2.</p>
<p>If we consider $\textbf{opt}$ the optimal solution to a BP, then if $ff(\cdot)$ is the function giving us the value returned by first-fit, we have, for an instance $X$: $ff(X) \leq 2 \times \textbf{opt}$.</p>
<h3 id="first-fit-decreasing">First-Fit Decreasing</h3>
<p>A simple optimisation of first-fit is to first sort the items by decreasing sizes, as such the approximation ratio is now: $11 / 9 \textbf{ opt} + 1$.</p>
<h2 id="aptas">APTAS</h2>
<p>To go further in the realm of AAs, we can look at algorithms with guarantees on their runtime. There are two main class of such algorithms:</p>
<ol>
<li>Fully polynomial time (FPTAS), which means that the algorithm will always executes in time polynomial in the size of the instance (plus $\varepsilon$ as factor).<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup></li>
<li>Asymptotically polynomial time (APTAS), in this case the guarantees on the runtime only hold if the size of the instance is infinity.</li>
</ol>
<p>In this post I am looking at the second category, the APTAS for BP <a class="citation" href="#de1981bin">(De La Vega & Lueker, 1981)</a>; let us hope it works well quickly.</p>
<h3 id="the-algorithm">The algorithm</h3>
<p>The idea behind this algorithm is to isolate certain properties of the problem (here, bundle items by size range and discard the smallest ones). Then use dynamic programming to compute an <em>exact</em> solution to the relaxed problem. Finally, return to the original problem.</p>
<p>The APTAS for BP works as follows:</p>
<ol>
<li>Remove small items – of size < $\varepsilon$. This gives set $I$ for small items and set $J$ for the rest.</li>
<li>From $J$ we will create a new set containing $k$<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup> item sets. Then set the size of items within each set to the maximum in this set, obtaining set $J’$.</li>
<li>Find optimal packing of $J’$ using dynamic programming.</li>
<li>Replace items in the packing with their actual sizes, taken from $J$.</li>
<li>Pack small items using first-fit, “on top” of the packing.</li>
</ol>
<p>The important step here is #3, so I will talk about it a bit longer. The dynamic programming approach is based on searching for <em>patterns</em> that fit into increasing number of bins.</p>
<ol>
<li>Create all possible packing that fit into a single bin, this gives pattern: $P_1$.</li>
<li>To compute $P_{n+1}$, merge the previous pattern with the original pattern: <script type="math/tex">P_{n+1} = P_n \times P_1</script><sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup></li>
<li>Finish if we can fit all items in $n$ bins.</li>
</ol>
<p>The idea is that, at each step you increase the number of bins by one by generating new patterns that fit into a bin anyway.</p>
<h3 id="example">Example</h3>
<p>Given a bin of size 1, and</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align} J & = \{\{.2, .3\}, \{.4, .5\}, \{.7\}\} \\ J' & = \{\{.3, .3\}, \{.5, .5\}, \{.7\}\} \end{align} %]]></script>
<p>this is how the DP will find the <em>optimal</em> packing. We first determine the set of patterns that fit into a single bin. Then, to create additional patterns, we simply extend the number of bins available to fit patterns, which would go as follows.</p>
<table>
<thead>
<tr>
<th>n</th>
<th>$P_n$</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>[{.3}], [{.3, .3}], [{.3, .5}], [{.3, .7}], [{.5}], [{.5}, {.5}], [{.7}]</td>
</tr>
<tr>
<td>2</td>
<td>[{.3}, {.3}], [{.3}, {.3, .3}], [{.3}, {.3, .5}], …</td>
</tr>
<tr>
<td>3</td>
<td>[{.3}, {.3}, {.3}], …, [{.3, .5}, {.5}, {.3, .7}]</td>
</tr>
</tbody>
</table>
<p>At iteration 3 we actually find patterns that has as many items as $J’$, thus we can stop.</p>
<h1 id="optimising">Optimising</h1>
<p>Okay, so now we have a working implementation, but there is a “but.” It is slow. As in, badly lagging for anything larger than ten items. So let us start debugging the code to find the bottlenecks.</p>
<p>To do so, I will use the <a href="https://docs.python.org/3/library/profile.html">Python profiler</a> from the command line. It produces either results in the terminal or in a file format. The file format seems the most convenient, however it is a binary file, so unfit to be read directly within an editor.</p>
<p>Because we are using Python, there is a package for that: <a href="https://github.com/ymichael/cprofilev">cprofilev</a>. This tool generates a web page given a cProfile output file. Perfect.</p>
<p>I will focus on few statistics which I consider the most important:</p>
<ul>
<li><strong><code class="highlighter-rouge">cumtime</code>:</strong> the running time of a given function, in a way its actual runtime.</li>
<li><strong><code class="highlighter-rouge">tottime</code>:</strong> the total running time of a function, that is the sum of the <code class="highlighter-rouge">cumtime</code> of the functions called within this one.</li>
<li><strong><code class="highlighter-rouge">ncalls</code>:</strong> the number of calls to a function, useful for finding extra object creation and the like.</li>
</ul>
<h2 id="configuration">Configuration</h2>
<p>To allow reproducibility because variance seems quite large I will fix the random seed, so every run will get the exact same set of bins to optimise.<sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup> I randomly generate 10 items with <code class="highlighter-rouge">numpy</code>, with weights between 1 and 10, then normalise before running the APTAS.</p>
<ul>
<li>Python 3.5</li>
<li>seed: 100</li>
<li>$\varepsilon = 0.1$</li>
<li>$N = 10$</li>
</ul>
<p>The results of running required quite a bit of computation:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>13 configurations with 1 bins.
111 configurations with 2 bins.
687 configurations with 3 bins.
3330 configurations with 4 bins.
12810 configurations with 5 bins.
37710 configurations with 6 bins.
Feasible in 7 bins.
</code></pre></div></div>
<h2 id="first-run">First Run</h2>
<p>And this is the header of the profile:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>35169689 function calls (32064478 primitive calls) in 22.659 seconds
</code></pre></div></div>
<p>Yep, packing ten objects in six bins has a runtime of over twenty seconds. But then again, the runtime of this algorithm is in polynomial time as it does not depend on the size on the input, only on the number of different sizes in $J’$ and $\epsilon$ which are fixed parameter. However, once these constants are fixed we still have a huge increase in the solving time.</p>
<p>Here are the first few lines of the output, sorted by <code class="highlighter-rouge">tottime</code>:<sup id="fnref:5"><a href="#fn:5" class="footnote">5</a></sup></p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> ncalls tottime percall cumtime percall filename:lineno(function)
221648 6.208 0.000 17.682 0.000 .../copy.py:137(deepcopy)
1329888 2.103 0.000 2.103 0.000 {method '__deepcopy__' of 'numpy.generic' objects}
1994832 1.636 0.000 2.107 0.000 .../copy.py:253(_keep_alive)
221648 1.343 0.000 2.391 0.000 .../collections/__init__.py:624(subtract)
8496613 1.239 0.000 1.239 0.000 {method 'get' of 'dict' objects}
221648 1.235 0.000 10.988 0.000 .../copy.py:239(_deepcopy_dict)
443297 1.169 0.000 2.573 0.000 .../collections/__init__.py:515(__init__)
1 1.015 1.015 22.572 22.572 aptas.py:52(exact_bp_k)
</code></pre></div></div>
<p>You have to go to line 8 for the first mention of a function I implemented, and the grand winner of this race is: <code class="highlighter-rouge">deepcopy</code>, which was called over two-hundred thousand times and accounts for about 27% of the total time. If you look at the cumulative time, which is the time taken by functions called by it, it skyrockets to <strong>78%</strong>!</p>
<h2 id="deepcopying">Deepcopying</h2>
<p>The good thing is: I do not have to look for the culprit; the bad thing is: I hope I do not have to deal with performances in the standard library.</p>
<p>Let us explore further, where do we call <code class="highlighter-rouge">deepcopy</code> so much? We can click on the function to get a breakdown of its callstack (sort of). We are mainly interested in the “Called By” section.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Function was called by...
ncalls tottime cumtime
.../copy.py:137(deepcopy) <- 221648 0.357 11.587 .../copy.py:223(<listcomp>)
2659776 4.634 9.712 .../copy.py:239(_deepcopy_dict)
221648 0.413 13.100 .../copy.py:269(_reconstruct)
221648 0.804 17.682 aptas.py:52(exact_bp_k)
{numpy __deepcopy__} <- 1329888 2.103 2.103 .../copy.py:137(deepcopy)
.../copy.py:239(_deepcopy_dict) <- 221648 1.235 10.988 .../copy.py:137(deepcopy)
.../copy.py:222(_deepcopy_tuple) <- 221648 0.651 12.419 .../copy.py:137(deepcopy)
.../copy.py:192(_deepcopy_atomic) <- 1329888 0.158 0.158 .../copy.py:137(deepcopy)
</code></pre></div></div>
<p>Okay, so everything in there are standard library calls <strong>but one</strong>, easy to figure out which line to look for – I already knew, but you may not.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">84</span> <span class="c1"># Available items
</span><span class="mi">85</span> <span class="n">current</span> <span class="o">=</span> <span class="n">deepcopy</span><span class="p">(</span><span class="n">avail</span><span class="p">)</span>
</code></pre></div></div>
<p>Alright, let us de-tangle this a little bit, first here is the definition of <code class="highlighter-rouge">avail</code>:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">63</span> <span class="n">avail</span> <span class="o">=</span> <span class="n">Counter</span><span class="p">(</span><span class="n">items</span><span class="p">)</span>
</code></pre></div></div>
<p>I use <code class="highlighter-rouge">avail</code> to count the different object sizes available to pack – remember the definition of the APTAS, $J’$ is a set of set of objects where each has the same size within one set.</p>
<p>Thus, when creating a new potential bin, I verify that I have enough “sizes” available; if not, I ditch the configuration.</p>
<h2 id="copying">Copying</h2>
<p>Okay, at this point it seems pretty clear that dictionary deep copying is an extremely inefficient operation. Maybe some of you are expert Pythonist, but I am not. And I have a deep, ingrained distrust for how references and values are handled.<sup id="fnref:6"><a href="#fn:6" class="footnote">6</a></sup></p>
<p>Thus, I tend to deep copy everything, otherwise I spend my time banging my head about why I have inconsistent data between two iterations of the same function. This time, let us give <code class="highlighter-rouge">copy</code> a chance.</p>
<p>First thing first: the results are the same, no corruption. Second, well, it does seem a tad more efficient:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>7020374 function calls (7018235 primitive calls) in 6.326 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
221648 1.264 0.000 2.177 0.000 .../collections/__init__.py:624(subtract)
443297 1.008 0.000 2.328 0.000 .../collections/__init__.py:515(__init__)
1 0.764 0.764 6.178 6.178 aptas.py:52(exact_bp_k)
443297 0.633 0.000 1.282 0.000 .../collections/__init__.py:584(update)
221648 0.539 0.000 1.285 0.000 .../collections/__init__.py:772(__neg__)
</code></pre></div></div>
<p>As in, by a factor three—almost. I am not going to complain about that,<sup id="fnref:7"><a href="#fn:7" class="footnote">7</a></sup> this looks far more reasonable. Now the bulk of my operations seems to come from subtracting, which should be the counter.</p>
<h1 id="final-optimisations">Final Optimisations</h1>
<h2 id="duplicates">Duplicates</h2>
<p>At this point, the algorithm does work better but is still fairly slow. The curse of complexity is doubly efficient when you have inefficient operations, and even worse if they could be omitted. The point here is: the way we store configurations allows for one more optimisation.</p>
<p>We store lists of bins coming from the cartesian product of previous bins configuration with the single-bin configurations. Therefore, we quite often end-up with similar configurations multiple times. We can do a simple check to skip configurations that already exist.<sup id="fnref:8"><a href="#fn:8" class="footnote">8</a></sup></p>
<h2 id="symmetries">Symmetries</h2>
<p>One thing that goes with checking for duplicates is to find a way to avoid symmetric solutions. That is, configurations with the same set of bins but in different order.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>X = [[0.9], [0.8]]
Y = [[0.8], [0.9]]
</code></pre></div></div>
<p>Both these configurations are equivalent but considered different. When doing the product of possible bins for the next size, they will both appear though they do not provide actual alternatives.</p>
<h2 id="results">Results</h2>
<p>The results are finally up to the standard I would expect considering the algorithm. I will omit the callstack as it is mostly library calls.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>289812 function calls (287673 primitive calls) in 0.328 seconds
ncalls tottime percall cumtime percall filename:lineno(function)
7614 0.043 0.000 0.074 0.000 .../collections/__init__.py:624(subtract)
1 0.040 0.040 0.242 0.242 aptas.py:52(exact_bp_k)
15229 0.036 0.000 0.083 0.000 .../collections/__init__.py:515(__init__)
15229 0.022 0.000 0.046 0.000 .../collections/__init__.py:584(update)
7614 0.019 0.000 0.045 0.000 .../collections/__init__.py:772(__neg__)
</code></pre></div></div>
<p>That is an improvement by a factor of almost twenty! Reducing the number of configurations to handle was the obvious way to make the algorithm efficient, but needed a bit of tweaking first.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>13 configurations with 1 bin.
57 configurations with 2 bins.
130 configurations with 3 bins.
191 configurations with 4 bins.
189 configurations with 5 bins.
126 configurations with 6 bins
Feasible in 7 bins.
</code></pre></div></div>
<h1 id="conclusion">Conclusion</h1>
<p>As with most optimisation algorithms, the devil is in the details. The APTAS for bin-packing is still not a very efficient heuristic at the end of the day, its only advantage being to provide a guarantee on its results.</p>
<p>As a side note, I did try to implement this algorithm using numpy but the results were atrocious. Not so much that we have to convert the sizes, but that checking for equality between two numpy arrays is a very inefficient operation – used when checking for duplicates. But I may be wrong.</p>
<p>One last thing I may try is to convert the list of objects to a vector of numbers (as I did in numpy) as it would reduce the input size further.</p>
<h1 id="references">References</h1>
<ol class="bibliography"><li><span id="de1981bin">De La Vega, W. F., & Lueker, G. S. (1981). Bin packing can be solved within 1+ε in linear time. <i>Combinatorica</i>, <i>1</i>(4), 349–355.</span></li></ol>
<h1 id="footnotes">Footnotes</h1>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
TeX: { equationNumbers: { autoNumber: "AMS" } },
tex2jax: {
inlineMath: [ ['$','$'] ],
processEscapes: true
}})
</script>
<script type="text/javascript" async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js?config=TeX-MML-AM_CHTML">
</script>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>Be careful, <em>polynomial</em> here can mean any power of the size, e.g. the textbook FPTAS for knapsack is in $\mathcal{O}(n^3)$. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>$k = \lceil 1 / \varepsilon^2 \rceil$ <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>The $\times$ here means <em>cartesian product</em>, so, yes, the number of feasible patterns will grow really quickly. <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>I was lucky and quickly found a seed that gave awful performances. <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>For readability, I will replace the Python path in library calls with “<code class="highlighter-rouge">...</code>” <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>Do not get me started on how, in my opinion, Python breaks the developer’s trust. <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
<li id="fn:7">
<p>Though I will have to go dig into some of my older code and see if I cannot get the same kind of improvements. <a href="#fnref:7" class="reversefootnote">↩</a></p>
</li>
<li id="fn:8">
<p>Note: doing this only actually slows down the algorithm as very few are actually removed and checking takes a long time. <a href="#fnref:8" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
Tue, 14 Aug 2018 00:00:00 +0000
http://arthur.maheo.net/aptas-for-bin-packing/
http://arthur.maheo.net/aptas-for-bin-packing/heuristicspostsImplementing Lin-Kernighan in Python<p>In this post, I will talk about my journey to implement the infamous Lin-Kernighan heuristic to solve efficiently TSP problems. This work was done in the ambit of a larger project, thus the code will be in Python, available <a href="https://gitlab.com/Soha/local-tsp">here</a>.</p>
<h1 id="preamble">Preamble</h1>
<p>In <a href="http://arthur.maheo.net/posts/python-local-tsp-heuristics/">a previous post</a> I introduced local search heuristics for the TSP and tried to show how they can be generalised to larger neighbourhoods. These heuristics are called 2-opt, 3-opt, up to $k$-opt as they try to find a set of 2, 3, …, $k$ edges to exchange in order to produce a better tour.</p>
<p>However, the <em>curse of complexity</em> looms. Indeed, each of these algorithms, in their naïve implementation, has a complexity of $O(n^k)$. Which degenerates very quickly into intractable processes.</p>
<p>To extend the approach, we can use the idea that every $k$-optimal tour is also $l$-optimal, with $l < k$. Hence, once we have a 2-optimal tour, we try to improve it using 3-opt moves, and repeat until we reach $k$. This approach has two main drawbacks:</p>
<ol>
<li>As I have shown earlier by implementing <em>fast</em> versions of local operators, we do not completely avoid the jump in complexity.</li>
<li>Finding a <em>good $k$</em> is a matter of experimenting; we cannot know in advance which value of $k$ will lead to good results in reasonable time.</li>
</ol>
<p>Enter <em>adaptive</em> algorithms, one such algorithm will start from a <em>promising</em> move and try to extend it as much as possible. A well-known case is called: “2.5-opt.”<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup></p>
<p>The basic idea is that every time we find an improving 2-opt move, we look for a third edge we could exchange. This algorithm thus combines the speed of 2-opt and the quality of 3-opt. I have not had the occasion to implement it, but maybe one day.</p>
<h1 id="the-lin-kernighan-heuristic-for-the-tsp">The Lin-Kernighan Heuristic for the TSP</h1>
<p>When I say infamous, I am obviously referring to the task of implementing it;<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup> it is still the best performing TSP heuristic out there. It was first described in the 70s <a class="citation" href="#lin1973effective">(Lin & Kernighan, 1973)</a> and since then much work was devoted to improve it further. The most notable effort coming from Helsgaun in a series of papers dealing with “efficiently implementing [it].”</p>
<p>However, I found it was lacking in the sense of algorithmic explanation,<sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup> and they do not give a clear overview of its implementation, the best I found being in Helsgaun’s report <a class="citation" href="#helsgaun1998writings">(Helsgaun, 1998)</a> – which I used as a basis for this implementation.</p>
<p>The current state-of-the-art heuristic is thus the Lin-Kernighan-Helsgaun (LKH), introduced in 2000 <a class="citation" href="#helsgaun2000effective">(Helsgaun, 2000)</a>. The idea of LKH is to use a 5-opt move as its basis for optimisation, where LK uses a 2-opt move with exceptions. Later, it was refined to use any $k$-opt move by extending the current move <a class="citation" href="#helsgaun2009general">(Helsgaun, 2009)</a>.</p>
<p>Here, I will be talking about the <em>basic</em> Lin-Kernighan (LK) heuristic, I may do the Helsgaun improvements later depending on my needs.</p>
<h2 id="rationale">Rationale</h2>
<p>The Lin-Kernighan heuristic answers the question: which $k$-opt to execute to improve the current tour? We know that a tour of length $n$ is optimal if there does not exist an $n$-opt improving move. This approach is obviously not applicable as the complexity would then be: $O(n^n)$, and you do not want that.</p>
<p>Furthermore, any $k$-opt move also covers smaller $l$-opt move. Hence, the general $k$-opt implementation consists in selecting a maximum $k$, applying as many lower $l$-opt as possible, and only increase the size of the move when no more <em>lower</em> improvement is available.</p>
<p>Lin-Kernighan aims to provide a more flexible approach by increasing the $k$ as long as improving moves are found. As such, it is a $k$-opt which tries to find the highest possible improvement given a promising starting point.</p>
<h2 id="description">Description</h2>
<p>To represent the moves to execute, the algorithm maintains two sets: one with edges that will be removed from the current tour and one with edges that will be added. Removing and adding $k$ edges is equivalent to performing a $k$-opt move.</p>
<p>Lin-Kernighan starts by selecting a node from where to start on the current tour ($t_1$); from there it selects either the predecessor or the successor of the node on the tour ($t_2$), forming the first edge to remove. Then it selects the first edge to add in the neighbours of $t_2$ which do not belong to the tour ($t_3$) and has a positive gain.</p>
<p>From $t_3$ it selects either its predecessor or successor ($t_4$). If relinking $t_4$ with $t_1$ provides a better tour, we restart the algorithm with the new tour; otherwise, if the gain is still positive, we look for another node outside of the tour ($t_5$) to have a potential edge to add. And so on and so forth.</p>
<h2 id="overview">Overview</h2>
<p>The heuristic only allows <em>sequential exchanges</em>,<sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup> which means that the next candidate for insertion or deletion is chosen starting at the end of the previously chosen edge, hence they have the following form:</p>
<ol>
<li>$x_i = (t_{2i - 1}, t_{2i})$</li>
<li>$y_i = (t_{2i}, t_{2i + 1})$</li>
</ol>
<p>Therefore, we maintain two sets of edges:</p>
<ol>
<li>$X$ a set of “broken” edges, which will be removed from the tour;</li>
<li>$Y$ a set of “joined” edges, which will be added to the tour.</li>
</ol>
<p>Then, the algorithm uses a set of rules to decide which edges to break and which to join:</p>
<ol>
<li>The <em>running sum</em> $G_i = \sum_i c(x_i) - c(y_i)$ has to be positive.</li>
<li>The potential solution has to form a tour, which means that when we omit an edge we verify that we can relink to the depot, with one exception: when $i = 2$. This exception is to allow non-sequential 2-opt moves (cf. double bridge move).</li>
<li>$X$ and $Y$ have to be disjoint.</li>
</ol>
<p>Based on the description in Helsgaun’s report <a class="citation" href="#helsgaun1998writings">(Helsgaun, 1998)</a>, we can identify four parts in the algorithm:</p>
<ol>
<li>The main loop which will be use to restart the search.</li>
<li>The selection of the first two edges, that is selecting: $t_1$, $t_2$, and $t_3$.</li>
<li>The selection of the next edge to remove, which I will call <code class="highlighter-rouge">chooseX()</code>, during which we may stop the search if we have an improved tour.</li>
<li>The selection of the next edge to add, <code class="highlighter-rouge">chooseY()</code>.</li>
</ol>
<p>The last two steps are crucial in so far as they contain the recursive steps which I use to get rid of the <code class="highlighter-rouge">goto</code>. The function <code class="highlighter-rouge">chooseX()</code> calls <code class="highlighter-rouge">chooseY()</code> if it finds an eligible edge but not a better tour and <code class="highlighter-rouge">chooseY()</code> calls <code class="highlighter-rouge">chooseX()</code> if it finds an eligible edge.</p>
<p>Furthermore, their position in the algorithm make clear what the different set contains without need to use $i$ – which represents the current $k$-opt being tried.</p>
<p>At each step, both of these function will add an edge in their respective sets, <em>de facto</em> moving to the next $k$-opt. The beauty of the algorithm is its ability to give up gracefully when it figures it will not find any improvement but to persist under a lot of uncertainty.</p>
<p>The only level where we exhaust all the possibilities is in the selection of the first two edges, otherwise all move have to fulfill a number of criteria to be considered. As such, LK always performs at least all the 2-opt moves; the reason why is pretty obvious: 2-opt is the most cost efficient local heuristic, it makes sense to rely on it as much as possible.</p>
<p>From this observation, the complexity of LK has been estimated at $O(n^{2.2})$, which is extremely close to the complexity of 2-opt while remaining more efficient than 3-opt.</p>
<h1 id="implementation">Implementation</h1>
<p>The original code, and subsequent descriptions, relied on <code class="highlighter-rouge">goto</code> statement to handle the state of the computation, one of my goal was to rewrite it without them. These statement are only used to enable recursion and muddle the order of actions as well as the state of the variables.</p>
<p>For further details about the implementation, refer to the code, of course. I will not go into all the details used in the implementation as, though not trivial, most of them belong to preserving consistency in the data. They should still not be omitted from a complete implementation.</p>
<p>For example, I will not go into detail on how to handle data through the recursion as we need to be able to backtrack at some steps. I am fairly used to recursion, so the idea looked fairly straightforward to have intermediate structures at each recursive call, alleviating the need to keep track of the index $i$ which is central to the original algorithm’s description.</p>
<h2 id="main-loop">Main Loop</h2>
<p>The main loop is pretty straightforward, it only needs to restart the core improving function if the latter found a better tour.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># tour is the current tour to improve
</span><span class="k">while</span> <span class="n">improved</span><span class="p">:</span>
<span class="n">improved</span> <span class="o">=</span> <span class="n">improve</span><span class="p">(</span><span class="n">tour</span><span class="p">)</span>
</code></pre></div></div>
<h2 id="move-selection-loop">Move Selection Loop</h2>
<p>The move selection loop is the core of the algorithm: it chooses the nodes to optimise from, the first edge to remove and the first to add, then it calls <code class="highlighter-rouge">chooseX()</code>.</p>
<p>By using an unconditional loop, we enable the recursion described as “if there are untried alternatives” in the description of the algorithm and the <code class="highlighter-rouge">chooseX()</code> and <code class="highlighter-rouge">chooseY()</code> functions will take care of building the edge sets needed to make the move, and discard them if there do not find a possible improvement.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># tour is the current tour to otimise
</span><span class="k">for</span> <span class="n">t1</span> <span class="ow">in</span> <span class="n">tour</span><span class="p">:</span> <span class="c1"># Step 2
</span> <span class="k">for</span> <span class="n">t2</span> <span class="ow">in</span> <span class="n">tour</span><span class="o">.</span><span class="n">around</span><span class="p">(</span><span class="n">t1</span><span class="p">):</span> <span class="c1"># Step 3
</span> <span class="n">x1</span> <span class="o">=</span> <span class="p">(</span><span class="n">t1</span><span class="p">,</span> <span class="n">t2</span><span class="p">)</span>
<span class="n">X</span> <span class="o">=</span> <span class="nb">set</span><span class="p">(</span><span class="n">x1</span><span class="p">)</span>
<span class="k">for</span> <span class="n">t3</span> <span class="ow">in</span> <span class="n">neighbours</span><span class="p">[</span><span class="n">t2</span><span class="p">]:</span> <span class="c1"># Step 4
</span> <span class="n">y1</span> <span class="o">=</span> <span class="p">(</span><span class="n">t2</span><span class="p">,</span> <span class="n">t3</span><span class="p">)</span>
<span class="n">Y</span> <span class="o">=</span> <span class="nb">set</span><span class="p">(</span><span class="n">y1</span><span class="p">)</span>
<span class="n">gain</span> <span class="o">=</span> <span class="n">c</span><span class="p">(</span><span class="n">x1</span><span class="p">)</span> <span class="o">-</span> <span class="n">c</span><span class="p">(</span><span class="n">y1</span><span class="p">)</span>
<span class="k">if</span> <span class="n">gain</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">chooseX</span><span class="p">(</span><span class="n">tour</span><span class="p">,</span> <span class="n">t1</span><span class="p">,</span> <span class="n">t3</span><span class="p">,</span> <span class="n">gain</span><span class="p">,</span> <span class="n">broken</span><span class="p">,</span> <span class="n">joined</span><span class="p">):</span> <span class="c1"># Step 6, i = 2
</span> <span class="c1"># Return to Step 2, that is the initial loop
</span> <span class="k">return</span> <span class="bp">True</span>
<span class="c1"># Else try the other options, note how we retain X and Y with the
</span> <span class="c1"># right data. (Step 8-12)
# No improvement found
</span><span class="k">return</span> <span class="bp">False</span>
</code></pre></div></div>
<h2 id="choosing-edges">Choosing Edges</h2>
<p>Now, the core of LK is actually to choose which edges will be added and which will be removed from the current tour. Beyond its apparent complexity, the whole algorithm relies on one simple observation: most $k$-opt moves can be expressed as sequences of smaller moves – especially 2-opt. So, as soon as we have a couple of good edges, we should try to find further edges to “add” to the move.</p>
<p>Hence the loop to select edges refers back to itself in a recursive manner to explore larger values of $k$, stopping when no improvement can be found.</p>
<h3 id="choosing-an-edge-to-remove">Choosing an edge to remove</h3>
<p>When choosing an edge to remove, we need only look at two nodes: the predecessor and the successor of the tail of the last added edge. In LK lingo, we are looking for $x_i$ with head $t_{2i - 1}$ (the tail of the last added edge) and tail $t_{2i}$ (the node we are looking for).</p>
<p>When choosing an edge to remove from the tour, we need to perform a few checks:</p>
<ul>
<li>verify that relinking to $t_1$ forms a tour;</li>
<li>check that the new edge was not added previously.</li>
</ul>
<p>If both conditions are satisfied <strong>and</strong> we have a positive gain, we can save the newly formed tour and restart the algorithm with it. Otherwise, we will look for an edge to remove. Hence, we only stop the algorithm when finding an edge to remove which forms an improving tour when relinking with the start, skewing towards lower $k$-opt moves.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># last is the tail of the last added edge, t_2i - 1
# gain is the current gain from previous moves
</span><span class="k">for</span> <span class="n">t2i</span> <span class="ow">in</span> <span class="n">T</span><span class="p">[</span><span class="n">last</span><span class="p">]:</span>
<span class="n">xi</span> <span class="o">=</span> <span class="p">(</span><span class="n">last</span><span class="p">,</span> <span class="n">t2i</span><span class="p">)</span>
<span class="c1"># Gain from removing the edge
</span> <span class="n">Gi</span> <span class="o">=</span> <span class="n">gain</span> <span class="o">+</span> <span class="n">c</span><span class="p">(</span><span class="n">xi</span><span class="p">)</span>
<span class="c1"># We will never be able to relink the tour if we omit an edge
</span> <span class="c1"># rejoining the start from the current node.
</span> <span class="k">if</span> <span class="n">t2i</span> <span class="o">!=</span> <span class="n">t1</span> <span class="ow">and</span> <span class="n">xi</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">Y</span><span class="p">:</span>
<span class="n">X</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">xi</span><span class="p">)</span>
<span class="n">xr</span> <span class="o">=</span> <span class="p">(</span><span class="n">t2i</span><span class="p">,</span> <span class="n">t1</span><span class="p">)</span> <span class="c1"># Relink edge
</span> <span class="n">X2</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">xr</span><span class="p">)</span>
<span class="c1"># Cost of relinking
</span> <span class="n">relink</span> <span class="o">=</span> <span class="n">Gi</span> <span class="o">-</span> <span class="n">c</span><span class="p">(</span><span class="n">xr</span><span class="p">)</span>
<span class="c1"># The current solution does not form a valid tour
</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">is_tour</span><span class="p">(</span><span class="n">T</span><span class="p">,</span> <span class="n">X2</span><span class="p">,</span> <span class="n">Y</span><span class="p">):</span>
<span class="k">continue</span>
<span class="c1"># Save the current solution if the tour is better
</span> <span class="k">if</span> <span class="n">relink</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="n">T2</span> <span class="o">=</span> <span class="n">make_new_tour</span><span class="p">(</span><span class="n">T</span><span class="p">,</span> <span class="n">X2</span><span class="p">,</span> <span class="n">Y</span><span class="p">)</span>
<span class="n">save_tour</span><span class="p">(</span><span class="n">T2</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">True</span>
<span class="k">else</span><span class="p">:</span>
<span class="c1"># Pass on the newly "removed" edge but not the relink
</span> <span class="k">return</span> <span class="n">chooseY</span><span class="p">(</span><span class="n">T</span><span class="p">,</span> <span class="n">t1</span><span class="p">,</span> <span class="n">t2i</span><span class="p">,</span> <span class="n">Gi</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">Y</span><span class="p">)</span>
<span class="c1"># No improving edge, stop
</span><span class="k">return</span> <span class="bp">False</span>
</code></pre></div></div>
<h3 id="choosing-an-edge-to-add">Choosing an edge to add</h3>
<p>Once we found an edge to remove, and it does not fulfill the stopping criteria, we will try to find an edge to add. To do so, we want to find a node that will not form and edge in use in the current tour to improve.</p>
<p>The new edge to add must follow a few rules:</p>
<ul>
<li>it must not belong to the current tour;</li>
<li>the “following edge,” $x_{i+1}$, must exist;</li>
<li>the gain has to be positive.</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># t2i is the tail of the last excluded edge
# gain is the current gain from previous moves
</span><span class="k">for</span> <span class="n">node</span> <span class="ow">in</span> <span class="n">T</span><span class="p">[</span><span class="n">t2i</span><span class="p">]:</span> <span class="c1"># This choice can be improved
</span> <span class="n">yi</span> <span class="o">=</span> <span class="p">(</span><span class="n">t2i</span><span class="p">,</span> <span class="n">node</span><span class="p">)</span>
<span class="n">Gi</span> <span class="o">=</span> <span class="n">gain</span> <span class="o">-</span> <span class="n">c</span><span class="p">(</span><span class="n">yi</span><span class="p">)</span>
<span class="k">if</span> <span class="n">Gi</span> <span class="o">></span> <span class="mi">0</span> <span class="ow">and</span> <span class="n">yi</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">broken</span><span class="p">:</span> <span class="c1"># Check that x_i + 1 exists
</span> <span class="n">Y</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">yi</span><span class="p">)</span>
<span class="c1"># Stop at the first improving tour
</span> <span class="k">return</span> <span class="n">chooseX</span><span class="p">(</span><span class="n">T</span><span class="p">,</span> <span class="n">t1</span><span class="p">,</span> <span class="n">node</span><span class="p">,</span> <span class="n">Gi</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">Y</span><span class="p">)</span>
<span class="c1"># No improving nodes, keep looking
</span><span class="k">return</span> <span class="bp">False</span>
</code></pre></div></div>
<h1 id="improvements">Improvements</h1>
<p>From this starting point, there are a number of optimisations offered in the seminal paper, though some were rebuked later on. I will list those I implemented so far. Most of these improvements are heuristics to speed-up the search at the smallest cost with regards to optimality as possible.</p>
<h2 id="solution-removal">Solution Removal</h2>
<p>We immediately stop searching if we find a tour that we found at a previous iteration. The rationale is simple: considering we are doing $k$-opt moves, if we find an identical solution, it means the solution is currently optimal in so far as we are concerned.</p>
<h2 id="allow-disjoint-tour">Allow Disjoint Tour</h2>
<p>At the very beginning of the algorithm, we are looking for improving the tour as fast as possible. Hence, we relax the condition that stipulates that we can relink the tour for 2-opt moves.</p>
<h2 id="order-neighbours">Order Neighbours</h2>
<p>We looking for candidates to add to the tour, one crucial decision is which node to try to form a new edge. One way to make this search better is to order the neighbours. My first implementation uses the standard distance between nodes, the second refines it to the gain when selecting the next edge to exclude.</p>
<h3 id="limit-neighbours">Limit neighbours</h3>
<p>A heuristic proposed in the original LK paper is to only consider the five nearest neighbours. This allows to efficiently speed-up the search as, during <em>deep</em> searches, considering all neighbours will be expensive.</p>
<h3 id="special-case">Special case</h3>
<p>When choosing $x_4$, order the choice based on the length of the edge. (We have two choices, select the longest edge.) Also do not check if it forms a tour as we want to allow double bridge moves.</p>
<h1 id="results">Results</h1>
<p>I have to say, I am really impressed with the efficiency of this heuristic. It took some time to wrap my head around it, but I definitely think it is worth the investment if you need good TSP solutions fast.</p>
<p>I will compare LK with the local TSP heuristics presented before using the same test setup: <code class="highlighter-rouge">att48</code> the 48 state capitals of continental U.S. and <code class="highlighter-rouge">a280</code> a drilling problem with 280 holes.</p>
<h2 id="us-test">U.S. test</h2>
<p><img src="/assets/images/tsp/lkus.svg" alt="img" /></p>
<p>Here, we can see a very neat tour of the country with no apparent loop, as opposed to earlier results. (Note: the gap may come from an incorrect distance function.)</p>
<table>
<thead>
<tr>
<th><strong>att48</strong></th>
<th style="text-align: right">Average</th>
<th style="text-align: right">Gap (%)</th>
<th style="text-align: right">Time (s)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Greedy</td>
<td style="text-align: right">12861</td>
<td style="text-align: right">21.01</td>
<td style="text-align: right">0.80ms</td>
</tr>
<tr>
<td>2-opt*</td>
<td style="text-align: right">11755</td>
<td style="text-align: right">10.60</td>
<td style="text-align: right">0.19</td>
</tr>
<tr>
<td>2-opt</td>
<td style="text-align: right">11826</td>
<td style="text-align: right">11.28</td>
<td style="text-align: right">0.15</td>
</tr>
<tr>
<td>3-opt*</td>
<td style="text-align: right">11597</td>
<td style="text-align: right">9.11</td>
<td style="text-align: right">3.59</td>
</tr>
<tr>
<td>LK</td>
<td style="text-align: right">10787</td>
<td style="text-align: right">1.50</td>
<td style="text-align: right">1.36</td>
</tr>
</tbody>
</table>
<h2 id="drilling-test">Drilling test</h2>
<p><img src="/assets/images/tsp/lka280.svg" alt="img" /></p>
<p>Not only do the previous results look good in terms of performance, they were also obtained extremely quickly; it took less than 4min for LK to find the best results of all other heuristics.</p>
<p>Lin-Kernighan managed to outperform all other forms of local search in terms of performances while staying very competitive time-wise.</p>
<table>
<thead>
<tr>
<th><strong>a280</strong></th>
<th style="text-align: right">Average</th>
<th style="text-align: right">Gap (%)</th>
<th style="text-align: right">Time (s)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Greedy</td>
<td style="text-align: right">3148.11</td>
<td style="text-align: right">22.07</td>
<td style="text-align: right">0.02</td>
</tr>
<tr>
<td>2-opt*</td>
<td style="text-align: right">3072.24</td>
<td style="text-align: right">19.13</td>
<td style="text-align: right">43.81</td>
</tr>
<tr>
<td>2-opt</td>
<td style="text-align: right">2874.38</td>
<td style="text-align: right">11.45</td>
<td style="text-align: right">26.84</td>
</tr>
<tr>
<td>3-opt*</td>
<td style="text-align: right">2794.51</td>
<td style="text-align: right">8.36</td>
<td style="text-align: right">2343.79</td>
</tr>
<tr>
<td>LK</td>
<td style="text-align: right">2642.55</td>
<td style="text-align: right">2.46</td>
<td style="text-align: right">210.05</td>
</tr>
</tbody>
</table>
<p>Interestingly, LK is supposed to be an $O(n^{2.2})$ algorithm, so I expected timings closer to 2-opt. It makes sense though when considering an exponential scale that this small factor would impact the timings that much… Or my implementation can be improved, especially considering how seldom I hit optimal solutions.</p>
<h1 id="further-improvements">Further Improvements</h1>
<p>Although I am content with having a <em>basic</em> LK implementation right now, there are still a few improvements that I can add to the algorithm, I am just unsure how much work they will require.</p>
<h2 id="improved-neighbourhood">Improved Neighbourhood</h2>
<p>When looking for edges to add, we do a bit of work to identify interesting neighbours. Some more work could be done. I sort neighbours by their distance, however this is not efficient enough, it would be better to know their distance in terms of position in good solutions for the TSP.</p>
<p>One way to do this is to start with a one-tree<sup id="fnref:5"><a href="#fn:5" class="footnote">5</a></sup> and rank neighbours accordingly. Helsgaun offers an iterative algorithm to build a better neighbourhood for nodes.</p>
<h2 id="better-tour-structure">Better Tour Structure</h2>
<p>As described by Helsgaun, the tour structure in central to the performance of the algorithm, it needs to be able to perform the following operation efficiently: verify if adding/removing an edge still forms a tour.</p>
<p>One structure that fulfills this purpose – and was actually developed for it – is called <em>ejection chain</em> <a class="citation" href="#glover1992new">(Glover, 1992)</a>. The idea is to have a node “at the top,” linked to a circuit. Adding or removing edge has to go through that one node, making checking the correctness of the tour a simple step.</p>
<p>One of the advantages of using such a structure should come in when we verify that “$x_{i+1}$ exists” when choosing $y_i$. One observation in LK is that, for a given edge to exclude, there exists only one choice of node that belongs to the tour which allows to close it with $t_1$. Hence, we would not have to do this check in <code class="highlighter-rouge">chooseX()</code>.</p>
<h2 id="starting-tour">Starting Tour</h2>
<p>Helsgaun offers a construction heuristic amenable to be used with LK, putting better moves in reach. As I mention in the previous post, the starting tour can dramatically change the performance of the heuristics.</p>
<h2 id="fast-scheme">Fast Scheme</h2>
<p>Rather, I should say “slow” scheme. Currently, I follow the implementation of LK and it resets every time an improving move is identified. We could have a <em>slower</em> version that exhausts moves – at least for the 2-opt.</p>
<h2 id="neighbourhood-depth">Neighbourhood Depth</h2>
<p>Similarly, LK limits the size of neighbourhoods in a pretty arbitrary fashion. It would be interesting to have variable size neighbourhoods, maybe as part of the “slow” scheme.</p>
<h1 id="conclusion">Conclusion</h1>
<p>I hope I managed to dispel some fears, misconceptions, and accessibility issues revolving around LK; I sure had some when I started looking at it. As usual, comments and corrections welcome.</p>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
TeX: { equationNumbers: { autoNumber: "AMS" } },
tex2jax: {
inlineMath: [ ['$','$'] ],
processEscapes: true
}})
</script>
<script type="text/javascript" async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js?config=TeX-MML-AM_CHTML">
</script>
<h1 id="references">References</h1>
<ol class="bibliography"><li><span id="lin1973effective">Lin, S., & Kernighan, B. W. (1973). An effective heuristic algorithm for the traveling-salesman problem. <i>Operations Research</i>, <i>21</i>(2), 498–516.</span></li>
<li><span id="helsgaun1998writings">Helsgaun, K. (1998). An effective implementation of the Lin-Kernighan traveling salesman heuristic. <i>Writings on Computer Science</i>, (81).</span></li>
<li><span id="helsgaun2000effective">Helsgaun, K. (2000). An effective implementation of the Lin–Kernighan traveling salesman heuristic. <i>European Journal of Operational Research</i>, <i>126</i>(1), 106–130.</span></li>
<li><span id="helsgaun2009general">Helsgaun, K. (2009). General k-opt submoves for the Lin–Kernighan TSP heuristic. <i>Mathematical Programming Computation</i>, <i>1</i>(2), 119–163.</span></li>
<li><span id="glover1992new">Glover, F. (1992). New ejection chain and alternating path methods for traveling salesman problems. In <i>Computer science and operations research</i> (pp. 491–509). Elsevier.</span></li></ol>
<h1 id="footnotes">Footnotes</h1>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>Although there seems to be debate on what is called 2.5-opt, it is sometimes a specific version of 2-opt or an improved version of 3-opt. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>Helsgaun lists in his report around “forty references to LK, with only three achieving the same performances.” <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>As well as an accessible API, the “best” available being calling <a href="http://www.math.uwaterloo.ca/tsp/concorde.html">Concorde</a> one way or the other. <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>Not in the same sense as when you mention “sequential 2-opt,” here we only talk about the relation between edges. <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>: A one-tree is an approximation algorithm for the TSP where we build a minimum spanning tree, which is an $O(n)$ algorithm, on all nodes but the depot, then relink the depot to its two closest neighbours. <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
Wed, 06 Dec 2017 00:00:00 +0000
http://arthur.maheo.net/implementing-lin-kernighan-in-python/
http://arthur.maheo.net/implementing-lin-kernighan-in-python/heuristicspostsUsing a Modern Benders Framework<p>Since its creation in ‘62, the Benders Decomposition method <a class="citation" href="#Benders1962">(Benders, 1962)</a> has generated a lot of research around how to improve it. Its simplicity attracted a lot of attention but it is famously hard to implement efficiently. In this post, I will introduce a Branch-and-Benders-Cut framework that I am developping and trying to make as general as possible, available <a href="https://gitlab.com/Soha/branch-and-benders-cut">here</a>, using Intensity Modulated Radiation Therapy <a class="citation" href="#tacskin2013combinatorial">(Taşkın & Cevik, 2013)</a> as an example.</p>
<h1 id="modernizing-benders">Modernizing Benders</h1>
<p>The principle of the Benders Decomposition (BD) is simple: take a Mixed Integer Program (MIP) and separate it into its continuous (<em>sub-problem</em>) and integer components (<em>master problem</em>). Because solving a MIP is $\mathcal{NP}$-hard, the bigger the problem the harder the solving process, in an exponential fashion. Benders alleviate this difficulty by relaxing the MIP into a smaller, less constrained problem. However, it then needs to solve one MIP and one LP repeatedly to obtain the optimum.</p>
<h2 id="basic-algorithm">Basic Algorithm</h2>
<p>Benders Decomposition proceeds as follows:</p>
<ol>
<li>Project out all continuous variables and associated constraints from the master problem and replace them with an incumbent.</li>
<li>Solve the master problem to obtain a tentative solution.</li>
<li>Pass this solution as a parameter to the sub-problem and solve it.</li>
<li>Use duality to obtain multiplier from the sub-problem and generate a constraint in the master problem, called a <em>cut</em>.</li>
</ol>
<p>Repeat until the incumbent (added variable) in the master problem and the sub-problem have the same value. For a more complete explanation, refer to my previous <a href="http://arthur.maheo.net/posts/a-short-introduction-to-benders/">post</a> on this topic.</p>
<h2 id="yes-master-yes">Yes master, yes</h2>
<p>One of the main cause for slowness in this algorithm is that we need to solve a complete MIP, the master problem, at each iteration. Furthermore, at each iteration this MIP becomes harder to solve because we add a new constraint – or more in the case of <em>multicut</em> <a class="citation" href="#birge1988multicut">(Birge & Louveaux, 1988)</a>.</p>
<p>Although this is a simplified MIP, which is usually pretty easy, it is still cumbersome to solve it to optimality. To make the approach faster we need to rely on a simple observation: when solving a MIP, we will encounter a number of integer feasible solution which we will discard because they are not optimal.</p>
<p>A second observation is that <em>any</em> solution to the master problem enables us to generate valid Benders cuts <a class="citation" href="#mcdaniel1977modified">(McDaniel & Devine, 1977)</a>.<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> Thus, combining the two ideas leads to what is colloquially referred to as <em>Modern Benders</em>.</p>
<p>The new algorithm is:</p>
<ol>
<li>Solve the master problem using a Branch-and-Bound (B&B) framework.</li>
<li>At each node (integer or fractional depending), compute a Benders cut using the sub-problem.</li>
<li>Use the solution of the sub-problem as a new bound for branching.</li>
</ol>
<p>The main quality of this approach is that it only solves the master problem once. The downside is that adding cuts may make the search harder, so it is a balancing exercise to determine which to add.<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup></p>
<p>Using a modern solver, we have access to callbacks during the B&B at integer and/or fractional nodes. From there, we can generate <em>lazy constraints</em> which the solver will add to its cut pool and check when necessary. (Cf. <code class="highlighter-rouge">Cuts</code> in <code class="highlighter-rouge">benders/master.py</code> which is a callback class called at integer nodes.)</p>
<h1 id="modern-benders-framework">Modern Benders Framework</h1>
<p>In this section, I will introduce the Branch-and-Benders-Cut framework, called “BranDec,” that I am currently developing. I will use radiation therapy as an example. I will try to provide both the mathematical and Python description as a walkthrough on how to use it.</p>
<p>My goal with this framework is to make clear the separation between the Benders master problem, which is the core of the algorithm, and the specifics of the problem to solve. The usual approach to using Benders is <em>integrated</em>: the master problem’s variables, cut generation, and sometimes sub-problem are all aggregated into one massive function.</p>
<p>From my experience this is not necessary and makes every Benders implementation single-purpose, which defeats the idea of using a generic algorithm. My framework thus only comprises a single part: the master problem.</p>
<p>Of course, this also deals with the branching and cut generation – as in adding the cuts to the master at given nodes – but does not deal with modelling, getting dual values, etc. The way to use it is the following:</p>
<ul>
<li>Create an instance of sub-problem deriving <code class="highlighter-rouge">SubPb</code> which will have to implement a few methods: optimise given a master solution, get dual values from a solution.</li>
<li>Add variables and constraints to the master problem.</li>
<li>Solve.</li>
</ul>
<p>To run the code you will need Python 3.5 with <a href="https://www-01.ibm.com/software/commerce/optimization/cplex-optimizer/">CPLEX</a> and numpy installed.</p>
<h2 id="case-study">Case Study</h2>
<p>Intensity Modulated Radiation Therapy (IMRT) can be decomposed into the problem of matching a set of apertures to a given <em>fluence map</em>. A fluence map define the amount of radiation that has to be delivered to each area (cancer cell) and can be represented as an integer matrix. In the same fashion, we can represent the irradiation delivered by the machine as a set of apertures.</p>
<p>For example, consider the following fluence map, it can be decomposed into a set of apertures.</p>
<p>\[ \begin{bmatrix} 4 & 0 & 2 \\ 3 & 9 & 5 \end{bmatrix} = \\ 4 \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix} + 2 \begin{bmatrix} 0 & 0 & 1 \\ 0 & 0 & 1 \end{bmatrix} + 3 \begin{bmatrix} 0 & 0 & 0 \\ 1 & 1 & 1 \end{bmatrix} + 6 \begin{bmatrix} 0 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix}\]</p>
<p>To provide an efficient treatment, we want to minimize the amount of radiation received by the patient. This problem seems very amenable to BD: the master problem will choose a set of apertures to use while the sub-problem will decide how long they need to be used.</p>
<p>We can model the problem as follows:</p>
<ul>
<li>$T$ the target fluence map, a matrix with $n$ rows and $m$ columns.</li>
<li>$R$ the set of all apertures with:
<ul>
<li>we use $R(i,j)$ to denote the set of apertures that cover <em>bixel</em> $(i, j)$; and</li>
<li>$M_r$ the minimum required intensity among the bixels covered by aperture $r$.</li>
</ul>
</li>
<li>$b_{ij}$ the amount of radiation needed in each bixel.</li>
<li>$w$ the setup time per aperture.</li>
<li>$y_r$ a binary variable representing whether aperture $r$ is used or not.</li>
<li>$x_r$ a continuous variable representing how long each aperture will be used.</li>
</ul>
<p>\[\begin{align*} \min && \sum_{r \in R} (w \cdot y_r + x_r) \tag{MIP} \label{eq:mip} \\ s.t. && \sum_{r \in R(i,j)} x_r & = b_{ij} & \forall i, j \in T \\ && x_r & \leq M_r \cdot y_r & \forall r \in R \\ && x \geq 0, y \in {\mathbb{Y}} \end{align*}\]</p>
<h3 id="master-problem">Master problem</h3>
<p>From \eqref{eq:mip} we can derive the master problem by removing all continuous $x$ variables and replacing them with an incumbent $q$.</p>
<p>\[\begin{align*} \min && \sum_{r \in R} y_r + q \tag{Master} \label{eq:master} \\ s.t. && \text{Benders cuts} \\ && q \geq 0, y \in {\mathbb{Y}} \end{align*}\]</p>
<p>Hence $q$ is a lower estimator of the irradiation time required.</p>
<h4 id="code">Code</h4>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">bd</span> <span class="o">=</span> <span class="n">Benders</span><span class="p">()</span>
<span class="n">bd</span><span class="o">.</span><span class="n">setMinimise</span><span class="p">()</span>
<span class="n">bd</span><span class="o">.</span><span class="n">addVars</span><span class="p">([</span><span class="n">intensity</span><span class="p">]</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">apertures</span><span class="p">),</span>
<span class="n">names</span><span class="o">=</span><span class="p">[</span><span class="s">"y_{}"</span><span class="o">.</span><span class="nb">format</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">apertures</span><span class="p">))])</span>
<span class="c1"># No constraints in the base master
# Add the subproblem
</span><span class="n">bd</span><span class="o">.</span><span class="n">addSub</span><span class="p">({</span><span class="mi">0</span><span class="p">:</span> <span class="n">SPTT</span><span class="p">(</span><span class="n">apertures</span><span class="p">,</span> <span class="n">radiation</span><span class="p">)})</span>
</code></pre></div></div>
<p>The dictionary at the end is used for multicut schemes, here we have a single cut, thus a single entry.</p>
<h3 id="sub-problem">Sub-Problem</h3>
<p>The sub-problem then becomes: given a set of apertures, find if we can cover the fluence map.</p>
<p>\[\begin{align*} q(\bar{y}) = && \min \sum_{r \in R} x_r \tag{Sub} \label{eq:sub} \\ s.t. && \sum_{r \in R(i,j)} x_r & = b_{ij} & \forall i, j \in T & \tag{$\alpha$} \\ && x_r & \leq M_r \cdot \bar{y}_r & \forall r \in R & \tag{$\beta$} \\ && x \geq 0 \nonumber \end{align*}\]</p>
<p>Given the dual multipliers $\alpha$ and $\beta$ associated to each constraint in the sub-problem, we can write a Benders cut as follows:</p>
<ul>
<li>
<p><strong>Feasibility cut:</strong> A feasibility cut only removes the current solution from the master problem and is generated when the sub-problem is infeasible given the current solution.</p>
<p>\[ \sum_{i = 0}^{n} \sum_{j = 0}^m b_{ij} \cdot \hat{\alpha}_{ij} + \sum_{r \in R} (M_r \cdot \hat{\beta}_r) y_r \leq 0 \]</p>
</li>
<li>
<p><strong>Optimality cut :</strong> An optimality gives us further information on the quality of the solution and involves the incumbent to provide a tighter constraint.</p>
<p>\[ \sum_{i = 0}^{n} \sum_{j = 0}^m b_{ij} \cdot \hat{\alpha}_{ij} + \sum_{r \in R} (M_r \cdot \hat{\beta}_r) y_r \leq q \]</p>
</li>
</ul>
<p>One important thing to notice is the occurrence of the master variables alongside the dual values that correspond to constraints where their parametric version ($\bar{y}$) is used.</p>
<p>Therefore, a Benders cut can be expressed using only three items from the sub-problem:</p>
<ol>
<li>The dual values of constraints where the fixed master variables occur, which will form the left-hand side of the cut (in standard form).</li>
<li>The dual values of the constraint that only exist in the sub-problem, their sum will form the right-hand side of the cut.</li>
<li>Whether it is an optimality or feasibility cut so the master knows whether to include the incumbent in the left-hand side or not.</li>
</ol>
<h4 id="generating-cuts">Generating Cuts</h4>
<p>The code for setting up the sub-problem is mainly related to using CPLEX, I will let the reader peruse through it as there are no surprises there. However, I will detail quickly the functions used to “generate” the cuts.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">optiCut</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">duals</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">cpx</span><span class="o">.</span><span class="n">solution</span><span class="o">.</span><span class="n">get_dual_values</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">vs</span><span class="p">)</span>
<span class="n">rhs</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">cpx</span><span class="o">.</span><span class="n">solution</span><span class="o">.</span><span class="n">get_dual_values</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">us</span><span class="p">)</span>
<span class="k">return</span> <span class="n">duals</span><span class="p">,</span> <span class="n">rhs</span>
</code></pre></div></div>
<p>The optimality cut is straightforward: get the dual values of selected constraints.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">feasCut</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># Use a Farkas certificate as we have the primal and not the dual; the
</span> <span class="c1"># second parameter is the dual objective, should be equal to the $RHS *
</span> <span class="c1"># coefs$.
</span> <span class="n">ray</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">cpx</span><span class="o">.</span><span class="n">solution</span><span class="o">.</span><span class="n">advanced</span><span class="o">.</span><span class="n">dual_farkas</span><span class="p">()</span>
<span class="n">duals</span> <span class="o">=</span> <span class="p">[</span><span class="n">ray</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">us</span><span class="p">),</span> <span class="nb">len</span><span class="p">(</span><span class="n">ray</span><span class="p">))]</span>
<span class="n">rhs</span> <span class="o">=</span> <span class="p">[</span><span class="n">ray</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">us</span><span class="p">))]</span>
<span class="k">return</span> <span class="n">duals</span><span class="p">,</span> <span class="n">rhs</span>
</code></pre></div></div>
<p>Feasibility cuts are a tad more complex as we have to rely on a <em>Farkas certificate</em> to identify an extreme ray<sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup> in the dual polyhedron. Luckily for us, CPLEX provides us with a built-in function to do the job. The only thing we have to do by hand is differentiate the different dual values based on their ids.</p>
<h3 id="solving-the-problem">Solving the Problem</h3>
<p>A few things before starting:</p>
<ol>
<li>We will use the primal problem to exemplify the process, using the dual problem would lead to the <em>exact same solution</em>.</li>
<li>Because of the nature of linear programming, or rather the solver used, we cannot specify which value we want when the problem is <em>degenerate</em> (multiple equivalent solutions). However, this can only influence the number of iterations of the algorithm, the result will always be optimal.</li>
</ol>
<p>We will use the following numerical example to illustrate a run of the algorithm:</p>
<ul>
<li>Exposure time: $w = 7$, will be called <code class="highlighter-rouge">intensity</code>.</li>
<li>Fluence map, called <code class="highlighter-rouge">radiation</code>.</li>
</ul>
<p>\[ T = \begin{bmatrix} 8 & 3 \\ 5 & 0 \end{bmatrix} \]</p>
<ul>
<li>A set of five apertures, called <code class="highlighter-rouge">apertures</code>.</li>
</ul>
<p>\[ R = \left\{\begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix},\begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix},\begin{bmatrix} 0 & 0 \\ 1 & 0 \end{bmatrix},\begin{bmatrix} 1 & 0 \\ 1 & 0 \end{bmatrix},\begin{bmatrix} 1 & 1 \\ 0 & 0 \end{bmatrix}\right\} \]</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">apertures</span> <span class="o">=</span> <span class="p">[[[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">]],</span> <span class="p">[[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">]],</span> <span class="p">[[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]],</span>
<span class="p">[[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]],</span> <span class="p">[[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">]]]</span>
<span class="n">radiation</span> <span class="o">=</span> <span class="p">[[</span><span class="mi">8</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="p">[</span><span class="mi">5</span><span class="p">,</span> <span class="mi">0</span><span class="p">]]</span>
<span class="n">intensity</span> <span class="o">=</span> <span class="mi">7</span>
</code></pre></div></div>
<h4 id="initialisation">Initialisation</h4>
<p>First, let us express the sub-problem with the given values. We will not have any constraint corresponding to $\alpha_{22}$ as the target bixel is nil.</p>
<p>\[ \begin{align*} q(\bar{y}) = \min && x_1 + x_2 + x_3 + x_4 + x_5 \tag{SPTT} \\ s.t. && x_1 + x_4 + x_5 &= 8 \tag{$\alpha_{11}$}\\ && x_2 + x_5 &= 3 \tag{$\alpha_{12}$}\\ && x_3 + x_4 &= 5 \tag{$\alpha_{21}$}\\ && x_1 \leq 8 \bar{y}_1, x_2 \leq 3 \bar{y}_2, \\ && x_3 \leq 3 \bar{y}_3, x_4 \leq 5 \bar{y}_4, x5 & \leq 3 \bar{y}_5 \tag{$\beta_{1-5}$}\\ && x \geq 0 \end{align*} \]</p>
<p>I will provide sample output from the code as well, mainly <code class="highlighter-rouge">debug</code> level information.</p>
<h4 id="iteration-1">Iteration 1</h4>
<p>We begin with the following relaxed master problem.</p>
<p>\[ \begin{align*} \min && 7 \times (y_1 + y_2 + y_3 + y_4 + y_5) + q \label{eq:it_1} \tag{Pb.1} \\ s.t. && y \in \mathbb{B}, q \geq 0 \end{align*}\]</p>
<p>The optimal solution to \eqref{eq:it_1} is $\bar{y} = [0, 0, 0, 0, 0], \bar{q} = 0$ which leads to an infeasible sub-problem. From its Farkas certificate, we get the following values:</p>
<p>\[ \begin{matrix} \alpha_{11} = 1, \alpha_{12} = 0, \alpha_{21} = 0 \\ \beta_{1} = -1, \beta_{2} = 0, \beta_{3} = 0, \beta_{4} = -1, \beta_{5} = -1 \end{matrix} \]</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Sol: 00000 - q: 0.000000
Feasibility Cut
Beta: [-1.0, 0.0, 0.0, -1.0, -1.0], Alpha: [1.0, 0.0, 0.0]
LHS: [-8.0, 0.0, 0.0, -5.0, -3.0], RHS: 8.0
</code></pre></div></div>
<h4 id="iteration-2">Iteration 2</h4>
<p>With the previous results we obtain the new master problem.</p>
<p>\[ \begin{align*} \min && 7 \times (y_1 + y_2 + y_3 + y_4 + y_5) + q \label{eq:it_2} \tag{Pb.2} \\ s.t. && 8 - 8 y_1 - 5 y_4 - 3 y_5 & \leq 0 \\ && y \in \mathbb{B}, q \geq 0 \end{align*}\]</p>
<p>We can easily see that the first member of the constraint is our $\alpha_{11} \times b_{11}$ and that the multiplier for each $y_r$ is: $\beta_r \times M_r$.</p>
<p>The optimal solution is now: $\bar{y} = [1, 0, 0, 0, 0], \bar{q} = 0$. The sub-problem is still infeasible so we will generate a feasibility cut with the following dual values:</p>
<p>\[ \begin{matrix} \alpha_{11} = 0, \alpha_{12} = 0, \alpha_{21} = 1 \\ \beta_{1} = 0, \beta_{2} = 0, \beta_{3} = -1, \beta_{4} = -1, \beta_{5} = 0 \end{matrix} \]</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Sol: 10000 - q: 0.000000
Feasibility Cut
Beta: [0.0, 0.0, -1.0, -1.0, 0.0], Alpha: [0.0, 0.0, 1.0]
LHS: [0.0, 0.0, -5.0, -5.0, 0.0], RHS: 5.0
</code></pre></div></div>
<h4 id="iteration-3">Iteration 3</h4>
<p>We now have the new master problem with two cuts.</p>
<p>\[ \begin{align*} \min && 7 \times (y_1 + y_2 + y_3 + y_4 + y_5) + q \label{eq:it_3} \tag{Pb.3} \\ s.t. && 8 - 8 y_1 - 5 y_4 - 3 y_5 & \leq 0 \\ && 5 - 5 y_3 - 5 y_4 & \leq 0 \\ && y \in \mathbb{B}, q \geq 0 \end{align*}\]</p>
<p>The optimal solution becomes: $\bar{y} = [1, 0, 0, 1, 0], \bar{q} = 0$, which again leads to an infeasible sub-problem with the following dual values:</p>
<p>\[ \begin{matrix} \alpha_{11} = 0, \alpha_{12} = 1, \alpha_{21} = 0 \\ \beta_{1} = 0, \beta_{2} = -1, \beta_{3} = 0, \beta_{4} = 0, \beta_{5} = -1 \end{matrix} \]</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Sol: 10010 - q: 0.000000
Feasibility Cut
Beta: [0.0, -1.0, 0.0, 0.0, -1.0], Alpha: [0.0, 1.0, 0.0]
LHS: [0.0, -3.0, 0.0, 0.0, -3.0], RHS: 3.0
</code></pre></div></div>
<h4 id="iteration-4">Iteration 4</h4>
<p>We add a third feasibility cut to the master.</p>
<p>\[ \begin{align*} \min && 7 \times (y_1 + y_2 + y_3 + y_4 + y_5) + q \label{eq:it_4} \tag{Pb.4} \\ s.t. && 8 - 8 y_1 - 5 y_4 - 3 y_5 & \leq 0 \\ && 5 - 5 y_3 - 5 y_4 & \leq 0 \\ && 3 - 3 y_2 - 3 y_5 & \leq 0 \\ && y \in \mathbb{B}, q \geq 0 \nonumber \end{align*}\]</p>
<p>Optimal solution: $\bar{y} = [0, 0, 0, 1, 1], \bar{q} = 0$. Guess what? Yes, <strong>feasible</strong> sub-problem. In this case, the sub-problem is mainly checking the feasibility of increasingly <em>expensive</em> solutions because the lower bound of the master problem is monotonically increasing, the first feasible solution thus yield the optimum.</p>
<p>However this property is not enough to conclude as it does not hold in the general case. Thus we add an optimality cut to the master problem and do one more iteration using the following dual values:</p>
<p>\[ \begin{matrix} \alpha_{11} = 1, \alpha_{12} = 1, \alpha_{21} = 1 \\ \beta_{1} = 0, \beta_{2} = 0, \beta_{3} = 0, \beta_{4} = -1, \beta_{5} = -1 \end{matrix} \]</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Sol: 00011 - q: 0.000000
Optimality Cut with x = [0, 0, 0, 5, 3]
Beta: [0.0, 0.0, 0.0, -1.0, -1.0], Alpha: [1.0, 1.0, 1.0]
LHS: [0.0, 0.0, 0.0, -5.0, -3.0], RHS: 16.0
</code></pre></div></div>
<h4 id="iteration-5">Iteration 5</h4>
<p>We now have the first optimality cut in the master.</p>
<p>\[ \begin{align*} \min && 7 \times (y_1 + y_2 + y_3 + y_4 + y_5) + q \label{eq:it_5} \tag{Pb.5} \\ s.t. && 8 - 8 y_1 - 5 y_4 - 3 y_5 & \leq 0 \\ && 5 - 5 y_3 - 5 y_4 & \leq 0 \\ && 3 - 3 y_2 - 3 y_5 & \leq 0 \\ && 16 - 5 y_4 - 3 y_5 & \leq q \\ && y \in \mathbb{B}, q \geq 0 \end{align*} \]</p>
<p>Notice $q$ appearing in the RHS of the optimality cut.<sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup></p>
<p>The solution to \eqref{eq:it_5} is: $\bar{y} = [0, 0, 0, 1, 1], \bar{q} = 8$. We have $\bar{q}$ equal to the value of the sub-problem with solution: $\bar{x} = [0, 0, 0, 5, 3]$, thus we found the optimal solution to the problem.</p>
<h4 id="sample-output">Sample output</h4>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Added 5 master variables.
Registered 1 sub-problems bundled in 1 cuts.
Itrs | Master | q | q(z) | Opt | UB
---- + ---------- + ---------- + ---------- + --- + ----------
1 | 0.000 | 0.000 | 0.000 | | 0.000
2 | 7.000 | 0.000 | 0.000 | | 0.000
3 | 14.000 | 0.000 | 0.000 | | 0.000
4 | 14.000 | 0.000 | 8.000 | X | 0.000
Found q(z) = q.
Solution 00011 = 22.0
Stop after 0 nodes (5 integer).
</code></pre></div></div>
<h1 id="conclusion">Conclusion</h1>
<p>I hope this post served its purposes: help make clear the separation between Benders’s algorithm and the models we use, give a numerical example of using Benders, and introduce a Python framework for using Benders in a modern way.</p>
<p>As an exercise, you can try implementing the first example matrix.</p>
<p>Feel free to poke me if you have any comment, remark, or correction.</p>
<h1 id="references">References</h1>
<ol class="bibliography"><li><span id="Benders1962">Benders, J. F. (1962). Partitioning procedures for solving mixed-variables programming problems. <i>Numerische Mathematik</i>, <i>4</i>(1), 238–252.</span></li>
<li><span id="tacskin2013combinatorial">Taşkın, Z. C., & Cevik, M. (2013). Combinatorial Benders cuts for decomposing IMRT fluence maps using rectangular apertures. <i>Computers & Operations Research</i>, <i>40</i>(9), 2178–2186.</span></li>
<li><span id="birge1988multicut">Birge, J. R., & Louveaux, F. V. (1988). A multicut algorithm for two-stage stochastic linear programs. <i>European Journal of Operational Research</i>, <i>34</i>(3), 384–392.</span></li>
<li><span id="mcdaniel1977modified">McDaniel, D., & Devine, M. (1977). A modified Benders’ partitioning algorithm for mixed integer programming. <i>Management Science</i>, <i>24</i>(3), 312–319.</span></li></ol>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
TeX: { equationNumbers: { autoNumber: "AMS" } },
tex2jax: {
inlineMath: [ ['$','$'] ],
processEscapes: true
}})
</script>
<script type="text/javascript" async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js?config=TeX-MML-AM_CHTML">
</script>
<h1 id="footnotes">Footnotes</h1>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>“Any” as in even fractional solutions of the master, especially useful at the root node. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>It is an actual downside as cuts generated with the optimal solution to the master problem may be stronger, but in general it seems that adding Benders cuts during the B&B is better. <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>An extreme ray is a direction of unlimited increase, it occurs when the primal is infeasible – dual unbounded. <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>Technically, $q$ belongs to the LHS because, in normal form, all variables are on the LHS. <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
Tue, 05 Dec 2017 00:00:00 +0000
http://arthur.maheo.net/modern-benders-in-python/
http://arthur.maheo.net/modern-benders-in-python/benderspostsLocal TSP Heuristics in Python<p>As part of my current project, I needed a Python implementation of heuristics for the TSP. This post will be the first part about the journey of implementing these lovely algorithms. Part II will deal with Lin-Kernighan. Code is available <a href="https://gitlab.com/Soha/local-tsp">here</a>.</p>
<h1 id="motivation">Motivation</h1>
<p>The Traveling Salesman Problem (TSP) is the most famous combinatorial optimisation problem. It apparent simplicity, finding the shourtest tour visiting a set of nodes, belies its computational complexity. Humans, by visualisation, can achieve decent solution on small instances (~10% of the optimum) but will hardly ever find the optimum; computers will have trouble finding decent solutions without dedicated algorithms, the search space growing in a factorial fashion.<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup></p>
<p>The TSP has received a lot of attention from the scientific community: efficient heuristic procedures, e.g. Lin-Kernighan-Helsgaun <a class="citation" href="#helsgaun2000effective">(Helsgaun, 2000)</a>; metaheuristics, e.g. Ant Colony optimisation <a class="citation" href="#dorigo1997ant">(Dorigo & Gambardella, 1997)</a>; or exact solution algorithms, e.g. Branch-and-Cut <a class="citation" href="#padberg1991branch">(Padberg & Rinaldi, 1991)</a>.</p>
<p>I decided to make this a two-part series of post about implementing TSP heuristics because LK deserves its own post.</p>
<h1 id="local-tsp-operators">Local TSP Operators</h1>
<p>Let me start by introducing the classic $\lambda$-opt TSP operator. The idea behind these operators is that a solution tour for a TSP instance can be made better by swapping some of its edges if the new edges provide a reduction on the length of the tour.</p>
<p>It is easy to see that an optimal tour for a TSP with $n$ cities has to be $n$-optimal. But the complexity of the method grows with the size of the operator: 2-opt is an $O(n^2)$, 3-opt is an $O(n^3)$, etc.; thus making larger values of $\lambda$ unusable.</p>
<h2 id="methodology">Methodology</h2>
<p>I will use two examples from the classic TSPLib: <code class="highlighter-rouge">att48</code>,<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup> the TSP of the 48 state capitals of continental U.S.; and <code class="highlighter-rouge">a280</code> a drilling problem with 280 holes.</p>
<p>The descriptions in this post will use (Python) pseudo-code. A tour is a sequence of nodes representing the order of visits. For implementation details, please refer to <a href="https://gitlab.com/Soha/local-tsp">the code</a>.<sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup></p>
<p>I will use the following notation:</p>
<ul>
<li>$c(\cdot)$ is the cost of an edge or a tour;</li>
<li>$G[i]$ represents the neighbours of $i$ in the graph $G$;</li>
<li>$T[i]$ represents the neighbours of $i$ in the tour $T$, so its predecessor and successor;</li>
<li>an edge is composed of a head and a tail: $(t_i, t_j)$.</li>
</ul>
<h2 id="greedy">Greedy</h2>
<p>The basic operator would be the 1-opt; for every node, it will select its closest neighbour until all nodes have been visited, then relink with the depot (the starting node). It is also called “nearest neighbour (NN).”</p>
<p>This algorithm is obviously not efficient as it does not value the last relinking step at all and may end up in a local solution with a very long edge to go back to the depot.</p>
<p><img src="/assets/images/tsp/greedtest.svg" alt="img" /></p>
<h3 id="algorithm">Algorithm</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">node</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">visited</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
<span class="k">while</span> <span class="nb">len</span><span class="p">(</span><span class="n">visited</span><span class="p">)</span> <span class="o"><</span> <span class="nb">len</span><span class="p">(</span><span class="n">nodes</span><span class="p">):</span>
<span class="n">tour</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">node</span><span class="p">)</span>
<span class="n">visited</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">node</span><span class="p">)</span>
<span class="c1"># Find the closest, non-visited neighbour
</span> <span class="nb">next</span> <span class="o">=</span> <span class="n">find_closest</span><span class="p">(</span><span class="n">G</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">visited</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">i</span>
</code></pre></div></div>
<h3 id="example">Example</h3>
<p><img src="/assets/images/tsp/greedus.svg" alt="img" /></p>
<p>We can see here that the algorithm does not perform well, it creates a loop around Illinois, and the last steps are really long (Seattle to Phoenix to Tallahassee).</p>
<p><img src="/assets/images/tsp/greeda280.svg" alt="img" /></p>
<h2 id="2-opt">2-Opt</h2>
<p>The most well-known, and widely used TSP local operator is for sure 2-opt. Here, we want to select two edges that, if they were swapped, would produce a shorter tour.</p>
<p>The idea is that some edges might <em>cross-over</em> and one property of an optimal TSP solution is that no two edges should cross. However, graph planarity is one of these hard problems for computers, it is therefore less expensive to try to swap edges in order to reduce the length of tour.</p>
<p>A 2-opt move consists in finding a pair of nodes ($i$ and $j$) for which changing their outgoing edge with an new one will reduce the cost of the tour. In other words, we replace: $(i, i+1)$ with $(i, j)$ and $(j, j+1)$ with $(i+1, j+1)$. The gain offered by such a move is easily calculated as:</p>
<p>\[ g = c(i, j) + c(i+1, j+1) - c(i, i+1) - c(j, j+1) \]</p>
<p>If the gain is positive, we have an improving move.</p>
<p>To execute the move we simply need to keep the tour intact until node $i$, add its new neighbour (tail of the chosen edge), append the tour between $i+1$ and $j$ in reverse order, then finish with the tail of $j$ and the rest of the original tour. This is called a “swap.”</p>
<p>The 2-opt is an amenable optimisation method because it yields decent results at a very small implementation cost: all 2-opt moves are feasible as long as we consider a complete graph; edge selection does not require any heuristic; the swap operation is straightforward on a tour; determining if a move will improve the tour is a simple cost check.</p>
<p><img src="/assets/images/tsp/twotest.svg" alt="img" /></p>
<h3 id="algorithm-1">Algorithm</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">while</span> <span class="n">improved</span><span class="p">:</span>
<span class="n">best</span> <span class="o">=</span> <span class="n">c</span><span class="p">(</span><span class="n">tour</span><span class="p">)</span> <span class="c1"># start with an initial tour
</span> <span class="n">size</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">tour</span><span class="p">)</span>
<span class="n">improved</span> <span class="o">=</span> <span class="bp">False</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">tour</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="n">size</span><span class="o">-</span><span class="mi">3</span><span class="p">]:</span>
<span class="c1"># i+2 because i+1 will be the tail of the edge
</span> <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="n">tour</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">2</span><span class="p">:</span><span class="n">size</span><span class="p">]:</span>
<span class="c1"># Calculate gain
</span> <span class="n">gain</span> <span class="o">=</span> <span class="n">c</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">)</span> <span class="o">+</span> <span class="n">c</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="n">j</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span> <span class="o">-</span> <span class="n">c</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span> <span class="o">-</span> <span class="n">c</span><span class="p">(</span><span class="n">j</span><span class="p">,</span> <span class="n">j</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span>
<span class="k">if</span> <span class="n">gain</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span> <span class="n">best</span> <span class="o">-=</span> <span class="n">gain</span>
<span class="c1"># i is the last element in place
</span> <span class="n">tour</span> <span class="o">=</span> <span class="n">swap</span><span class="p">(</span><span class="n">tour</span><span class="p">,</span> <span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">j</span><span class="p">)</span>
<span class="n">improved</span> <span class="o">=</span> <span class="bp">True</span>
<span class="k">break</span> <span class="c1"># return to while
</span></code></pre></div></div>
<h3 id="example-1">Example</h3>
<p><img src="/assets/images/tsp/twous.svg" alt="img" /></p>
<p>Here, the 2-opt manages to avoid the initial loop we saw in Greedy, but still has a fairly long relink at the end (Cheyenne to Montgomery).</p>
<p><img src="/assets/images/tsp/twoa280.svg" alt="img" /></p>
<h2 id="3-opt">3-Opt</h2>
<p>The 3-opt heuristic is a logical extension of the 2-opt: instead of relinking two nodes, we will relink three because some cases cannot be optimised by the 2-opt algorithm.<sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup></p>
<p>Now that we have three edges, we have to determine which way we want to re-arrange them while retaining a tour: we obtain <strong>seven</strong> different permutations. Which means that for every three nodes in the tour, we have to do seven operations.</p>
<p>The following figure presents the seven possible exchanges. The first figure is the original tour, the next three are 2-opt moves, and the second row contains the four <em>actual</em> 3-opt moves.</p>
<p>On thing to note: it is crucial for the performance of the algorithm to exhaust 2-opt moves before trying on 3-opt in the case where we select the first improving move.</p>
<p><img src="/assets/images/tsp/threetest.svg" alt="img" /></p>
<h3 id="algorithm-2">Algorithm</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">while</span> <span class="n">improved</span><span class="p">:</span>
<span class="n">best</span> <span class="o">=</span> <span class="n">c</span><span class="p">(</span><span class="n">tour</span><span class="p">)</span> <span class="c1"># start with an initial tour
</span> <span class="n">size</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">tour</span><span class="p">)</span>
<span class="n">improved</span> <span class="o">=</span> <span class="bp">False</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">tour</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="n">size</span><span class="o">-</span><span class="mi">5</span><span class="p">]:</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="n">tour</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">2</span><span class="p">:</span><span class="n">size</span><span class="o">-</span><span class="mi">3</span><span class="p">]:</span>
<span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="n">tour</span><span class="p">[</span><span class="n">j</span><span class="o">+</span><span class="mi">2</span><span class="p">:</span><span class="n">size</span><span class="o">-</span><span class="mi">1</span><span class="p">]:</span>
<span class="c1"># Have to check 7 (sic) permutations
</span> <span class="k">for</span> <span class="n">ex</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">7</span><span class="p">):</span>
<span class="n">path</span><span class="p">,</span> <span class="n">gain</span> <span class="o">=</span> <span class="n">exchange</span><span class="p">(</span><span class="n">tour</span><span class="p">,</span> <span class="n">ex</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">,</span> <span class="n">k</span><span class="p">)</span>
<span class="k">if</span> <span class="n">gain</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="n">best</span> <span class="o">-=</span> <span class="n">gain</span>
<span class="n">tour</span> <span class="o">=</span> <span class="n">path</span>
<span class="n">improved</span> <span class="o">=</span> <span class="bp">True</span>
<span class="k">break</span> <span class="c1"># return to while
</span></code></pre></div></div>
<p>Where <code class="highlighter-rouge">exchange()</code> is the function to swap <code class="highlighter-rouge">i, j, k</code> using one of the seven combinations: the first three are 2-opt moves, the last four 3-opt moves. I will let the interested reader refer to the code as it is but an extension of 2-opt moves. Do note that it can be implemented as a sequence of <code class="highlighter-rouge">swap()</code>.</p>
<h3 id="examples">Examples</h3>
<p><img src="/assets/images/tsp/threeus.svg" alt="img" /></p>
<p>In this example, 3-opt performs very similarly to 2-opt.</p>
<p><img src="/assets/images/tsp/threea280.svg" alt="img" /></p>
<h2 id="k-opt">K-Opt</h2>
<p>I hope that by now you have an intuition on how $k$-opt moves function: select up to $k$ nodes in a tour and find $k$ <em>better</em> edges to form a better tour starting from them. However, once we increase past 3-opt we start having possible non-sequential exchanges, that is moves we cannot express in terms of 2-opt moves.</p>
<p><img src="/assets/images/tsp/bridge.svg" alt="img" /></p>
<p>Few, if any, papers reference moves beyond 5-opt.</p>
<p>Any $k$-opt algorithm also leads to a $l$-optimal solution (with $l < k$), which means that exploring larger values of $k$ will always lead to better, or at least equivalent, tours. The issue is that we need to try out larger values of $k$ to find the best improvements first for fear of losing them to an $l$-opt.</p>
<p>For a tour of size $n$ to be optimal, we have to prove that it is $n$-opt – that is, there does not exist a permutation that leads to a better tour. However, due to the incresing complexity of the algorithm, this is not practical.</p>
<p>The difficulty of $k$-opt is therefore to determine which $k$ to use to make the tour better. Most algorithms use an adaptive approach: starting from a given tour, make it 2-opt; then try to make this result 3-opt; etc.</p>
<p>In the next post I will introduce the Lin-Kernighan heuristic for the TSP which solves this issue using the following idea: every time we find a <em>promising</em> move using 2-opt, we try to extend it to 3-opt by finding another edge to exclude, then to 4-opt, etc.</p>
<h1 id="results">Results</h1>
<p>One of the issue with local heuristics is their high variance. The starting tour influence greatly their runtime and performance by providing better moves earlier, if at all. Below is a small sample of results on both <code class="highlighter-rouge">att48</code> and <code class="highlighter-rouge">a280</code>.<sup id="fnref:5"><a href="#fn:5" class="footnote">5</a></sup> Names with a ‘*’ mean that the <em>fast</em> scheme<sup id="fnref:6"><a href="#fn:6" class="footnote">6</a></sup> was used.</p>
<p>Results on <code class="highlighter-rouge">att48</code> were run 100 times and the average reported. We can easily see how much longer 3-opt takes compared to 2-opt, but that having a <em>fast</em> scheme allows it to be competitive.</p>
<p>I have to mention here that, even with 100 runs, the results are still fluctuating quite a lot, I will try to have a more extensive test setup at some point. The execution times are fairly stable though.</p>
<table>
<thead>
<tr>
<th><strong>att48</strong></th>
<th style="text-align: right">Average</th>
<th style="text-align: right">Gap (%)</th>
<th style="text-align: right">Time (s)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Greedy</td>
<td style="text-align: right">12861</td>
<td style="text-align: right">21.01</td>
<td style="text-align: right">0.77ms</td>
</tr>
<tr>
<td>2-opt*</td>
<td style="text-align: right">11156</td>
<td style="text-align: right">4.97</td>
<td style="text-align: right">0.19</td>
</tr>
<tr>
<td>2-opt</td>
<td style="text-align: right">11443</td>
<td style="text-align: right">7.67</td>
<td style="text-align: right">0.16</td>
</tr>
<tr>
<td>3-opt*</td>
<td style="text-align: right">11460</td>
<td style="text-align: right">7.83</td>
<td style="text-align: right">3.17</td>
</tr>
<tr>
<td>3-opt</td>
<td style="text-align: right">10925</td>
<td style="text-align: right">2.79</td>
<td style="text-align: right">20.58</td>
</tr>
</tbody>
</table>
<p>An interesting observation with 20 runs on <code class="highlighter-rouge">a280</code> is the confirmation that, for 2-opt, the running time is not improved when having a fast scheme. This can be explained as the move selection phase runs in linear time, which means that we only shut down one “layer” of the optimisation, as opposed to 3-opt where we shut down two – in effect, reducing the complexity by one order of magnitude.</p>
<p>(And, yes, fast 3-opt takes almost 40min to complete.)</p>
<table>
<thead>
<tr>
<th><strong>a280</strong></th>
<th style="text-align: right">Average</th>
<th style="text-align: right">Gap (%)</th>
<th style="text-align: right">Time (s)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Greedy</td>
<td style="text-align: right">3148.11</td>
<td style="text-align: right">22.07</td>
<td style="text-align: right">0.02</td>
</tr>
<tr>
<td>2-opt*</td>
<td style="text-align: right">3072.24</td>
<td style="text-align: right">19.13</td>
<td style="text-align: right">43.81</td>
</tr>
<tr>
<td>2-opt</td>
<td style="text-align: right">2874.38</td>
<td style="text-align: right">11.45</td>
<td style="text-align: right">26.84</td>
</tr>
<tr>
<td>3-opt*</td>
<td style="text-align: right">2794.51</td>
<td style="text-align: right">8.36</td>
<td style="text-align: right">2343.79</td>
</tr>
</tbody>
</table>
<h1 id="references">References</h1>
<ol class="bibliography"><li><span id="helsgaun2000effective">Helsgaun, K. (2000). An effective implementation of the Lin–Kernighan traveling salesman heuristic. <i>European Journal of Operational Research</i>, <i>126</i>(1), 106–130.</span></li>
<li><span id="dorigo1997ant">Dorigo, M., & Gambardella, L. M. (1997). Ant colonies for the travelling salesman problem. <i>Biosystems</i>, <i>43</i>(2), 73–81.</span></li>
<li><span id="padberg1991branch">Padberg, M., & Rinaldi, G. (1991). A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. <i>SIAM Review</i>, <i>33</i>(1), 60–100.</span></li></ol>
<h1 id="footnotes">Footnotes</h1>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
TeX: { equationNumbers: { autoNumber: "AMS" } },
tex2jax: {
inlineMath: [ ['$','$'] ],
processEscapes: true
}})
</script>
<script type="text/javascript" async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js?config=TeX-MML-AM_CHTML">
</script>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>E.g., a TSP with 10 cities has 9! = 362,880 possible solutions, one with 11 cities has over three million. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>For anyone interested, pray have a look at the code to convert the results from <code class="highlighter-rouge">att48</code> to a map, the basic idea is that the entries in the file are ordered alphabetically by state name. <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>All algorithms described in this post have two possible implementations: one stops at the first improving move, the other explores all possible moves for a configuration and selects the best. I chose to use the latter so far. I plan to extend the code to have the two options available. <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>Do note, however, that any <em>3-opt move</em> can be expressed as a sequence of, at most, two <em>2-opt moves</em>.</p>
<p>The difference here is that the 2-opt <em>algorithm</em> finds one 2-opt <em>move</em> and execute it, while the 3-opt <em>algorithm</em>, in a way, finds up to two 2-opt <em>move</em>. <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>Results for <code class="highlighter-rouge">a280</code> were produced on an HPC cluster, too long for my poor computer. <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>Restart at the first improving move instead of exploring all possible moves and selecting the best. <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
Mon, 20 Nov 2017 00:00:00 +0000
http://arthur.maheo.net/python-local-tsp-heuristics/
http://arthur.maheo.net/python-local-tsp-heuristics/heuristicsposts