Implementing Lin-Kernighan in Python
In this post, I will talk about my journey to implement the infamous Lin-Kernighan heuristic to solve efficiently TSP problems. This work was done in the ambit of a larger project, thus the code will be in Python, available here.
Preamble
In a previous post I introduced local search heuristics for the TSP and tried to show how they can be generalised to larger neighbourhoods. These heuristics are called 2-opt, 3-opt, up to $k$-opt as they try to find a set of 2, 3, …, $k$ edges to exchange in order to produce a better tour.
However, the curse of complexity looms. Indeed, each of these algorithms, in their naïve implementation, has a complexity of $O(n^k)$. Which degenerates very quickly into intractable processes.
To extend the approach, we can use the idea that every $k$-optimal tour is also $l$-optimal, with $l < k$. Hence, once we have a 2-optimal tour, we try to improve it using 3-opt moves, and repeat until we reach $k$. This approach has two main drawbacks:
- As I have shown earlier by implementing fast versions of local operators, we do not completely avoid the jump in complexity.
- Finding a good $k$ is a matter of experimenting; we cannot know in advance which value of $k$ will lead to good results in reasonable time.
Enter adaptive algorithms, one such algorithm will start from a promising move and try to extend it as much as possible. A well-known case is called: “2.5-opt.”1
The basic idea is that every time we find an improving 2-opt move, we look for a third edge we could exchange. This algorithm thus combines the speed of 2-opt and the quality of 3-opt. I have not had the occasion to implement it, but maybe one day.
The Lin-Kernighan Heuristic for the TSP
When I say infamous, I am obviously referring to the task of implementing it;2 it is still the best performing TSP heuristic out there. It was first described in the 70s (Lin & Kernighan, 1973) and since then much work was devoted to improve it further. The most notable effort coming from Helsgaun in a series of papers dealing with “efficiently implementing [it].”
However, I found it was lacking in the sense of algorithmic explanation,3 and they do not give a clear overview of its implementation, the best I found being in Helsgaun’s report (Helsgaun, 1998) – which I used as a basis for this implementation.
The current state-of-the-art heuristic is thus the Lin-Kernighan-Helsgaun (LKH), introduced in 2000 (Helsgaun, 2000). The idea of LKH is to use a 5-opt move as its basis for optimisation, where LK uses a 2-opt move with exceptions. Later, it was refined to use any $k$-opt move by extending the current move (Helsgaun, 2009).
Here, I will be talking about the basic Lin-Kernighan (LK) heuristic, I may do the Helsgaun improvements later depending on my needs.
Rationale
The Lin-Kernighan heuristic answers the question: which $k$-opt to execute to improve the current tour? We know that a tour of length $n$ is optimal if there does not exist an $n$-opt improving move. This approach is obviously not applicable as the complexity would then be: $O(n^n)$, and you do not want that.
Furthermore, any $k$-opt move also covers smaller $l$-opt move. Hence, the general $k$-opt implementation consists in selecting a maximum $k$, applying as many lower $l$-opt as possible, and only increase the size of the move when no more lower improvement is available.
Lin-Kernighan aims to provide a more flexible approach by increasing the $k$ as long as improving moves are found. As such, it is a $k$-opt which tries to find the highest possible improvement given a promising starting point.
Description
To represent the moves to execute, the algorithm maintains two sets: one with edges that will be removed from the current tour and one with edges that will be added. Removing and adding $k$ edges is equivalent to performing a $k$-opt move.
Lin-Kernighan starts by selecting a node from where to start on the current tour ($t_1$); from there it selects either the predecessor or the successor of the node on the tour ($t_2$), forming the first edge to remove. Then it selects the first edge to add in the neighbours of $t_2$ which do not belong to the tour ($t_3$) and has a positive gain.
From $t_3$ it selects either its predecessor or successor ($t_4$). If relinking $t_4$ with $t_1$ provides a better tour, we restart the algorithm with the new tour; otherwise, if the gain is still positive, we look for another node outside of the tour ($t_5$) to have a potential edge to add. And so on and so forth.
Overview
The heuristic only allows sequential exchanges,4 which means that the next candidate for insertion or deletion is chosen starting at the end of the previously chosen edge, hence they have the following form:
- $x_i = (t_{2i - 1}, t_{2i})$
- $y_i = (t_{2i}, t_{2i + 1})$
Therefore, we maintain two sets of edges:
- $X$ a set of “broken” edges, which will be removed from the tour;
- $Y$ a set of “joined” edges, which will be added to the tour.
Then, the algorithm uses a set of rules to decide which edges to break and which to join:
- The running sum $G_i = \sum_i c(x_i) - c(y_i)$ has to be positive.
- The potential solution has to form a tour, which means that when we omit an edge we verify that we can relink to the depot, with one exception: when $i = 2$. This exception is to allow non-sequential 2-opt moves (cf. double bridge move).
- $X$ and $Y$ have to be disjoint.
Based on the description in Helsgaun’s report (Helsgaun, 1998), we can identify four parts in the algorithm:
- The main loop which will be use to restart the search.
- The selection of the first two edges, that is selecting: $t_1$, $t_2$, and $t_3$.
- The selection of the next edge to remove, which I will call
chooseX()
, during which we may stop the search if we have an improved tour. - The selection of the next edge to add,
chooseY()
.
The last two steps are crucial in so far as they contain the recursive steps which I use to get rid of the goto
. The function chooseX()
calls chooseY()
if it finds an eligible edge but not a better tour and chooseY()
calls chooseX()
if it finds an eligible edge.
Furthermore, their position in the algorithm make clear what the different set contains without need to use $i$ – which represents the current $k$-opt being tried.
At each step, both of these function will add an edge in their respective sets, de facto moving to the next $k$-opt. The beauty of the algorithm is its ability to give up gracefully when it figures it will not find any improvement but to persist under a lot of uncertainty.
The only level where we exhaust all the possibilities is in the selection of the first two edges, otherwise all move have to fulfill a number of criteria to be considered. As such, LK always performs at least all the 2-opt moves; the reason why is pretty obvious: 2-opt is the most cost efficient local heuristic, it makes sense to rely on it as much as possible.
From this observation, the complexity of LK has been estimated at $O(n^{2.2})$, which is extremely close to the complexity of 2-opt while remaining more efficient than 3-opt.
Implementation
The original code, and subsequent descriptions, relied on goto
statement to handle the state of the computation, one of my goal was to rewrite it without them. These statement are only used to enable recursion and muddle the order of actions as well as the state of the variables.
For further details about the implementation, refer to the code, of course. I will not go into all the details used in the implementation as, though not trivial, most of them belong to preserving consistency in the data. They should still not be omitted from a complete implementation.
For example, I will not go into detail on how to handle data through the recursion as we need to be able to backtrack at some steps. I am fairly used to recursion, so the idea looked fairly straightforward to have intermediate structures at each recursive call, alleviating the need to keep track of the index $i$ which is central to the original algorithm’s description.
Main Loop
The main loop is pretty straightforward, it only needs to restart the core improving function if the latter found a better tour.
# tour is the current tour to improve
while improved:
improved = improve(tour)
Move Selection Loop
The move selection loop is the core of the algorithm: it chooses the nodes to optimise from, the first edge to remove and the first to add, then it calls chooseX()
.
By using an unconditional loop, we enable the recursion described as “if there are untried alternatives” in the description of the algorithm and the chooseX()
and chooseY()
functions will take care of building the edge sets needed to make the move, and discard them if there do not find a possible improvement.
# tour is the current tour to otimise
for t1 in tour: # Step 2
for t2 in tour.around(t1): # Step 3
x1 = (t1, t2)
X = set(x1)
for t3 in neighbours[t2]: # Step 4
y1 = (t2, t3)
Y = set(y1)
gain = c(x1) - c(y1)
if gain > 0:
if self.chooseX(tour, t1, t3, gain, broken, joined): # Step 6, i = 2
# Return to Step 2, that is the initial loop
return True
# Else try the other options, note how we retain X and Y with the
# right data. (Step 8-12)
# No improvement found
return False
Choosing Edges
Now, the core of LK is actually to choose which edges will be added and which will be removed from the current tour. Beyond its apparent complexity, the whole algorithm relies on one simple observation: most $k$-opt moves can be expressed as sequences of smaller moves – especially 2-opt. So, as soon as we have a couple of good edges, we should try to find further edges to “add” to the move.
Hence the loop to select edges refers back to itself in a recursive manner to explore larger values of $k$, stopping when no improvement can be found.
Choosing an edge to remove
When choosing an edge to remove, we need only look at two nodes: the predecessor and the successor of the tail of the last added edge. In LK lingo, we are looking for $x_i$ with head $t_{2i - 1}$ (the tail of the last added edge) and tail $t_{2i}$ (the node we are looking for).
When choosing an edge to remove from the tour, we need to perform a few checks:
- verify that relinking to $t_1$ forms a tour;
- check that the new edge was not added previously.
If both conditions are satisfied and we have a positive gain, we can save the newly formed tour and restart the algorithm with it. Otherwise, we will look for an edge to remove. Hence, we only stop the algorithm when finding an edge to remove which forms an improving tour when relinking with the start, skewing towards lower $k$-opt moves.
# last is the tail of the last added edge, t_2i - 1
# gain is the current gain from previous moves
for t2i in T[last]:
xi = (last, t2i)
# Gain from removing the edge
Gi = gain + c(xi)
# We will never be able to relink the tour if we omit an edge
# rejoining the start from the current node.
if t2i != t1 and xi not in Y:
X.add(xi)
xr = (t2i, t1) # Relink edge
X2 = X.add(xr)
# Cost of relinking
relink = Gi - c(xr)
# The current solution does not form a valid tour
if not is_tour(T, X2, Y):
continue
# Save the current solution if the tour is better
if relink > 0:
T2 = make_new_tour(T, X2, Y)
save_tour(T2)
return True
else:
# Pass on the newly "removed" edge but not the relink
return chooseY(T, t1, t2i, Gi, X, Y)
# No improving edge, stop
return False
Choosing an edge to add
Once we found an edge to remove, and it does not fulfill the stopping criteria, we will try to find an edge to add. To do so, we want to find a node that will not form and edge in use in the current tour to improve.
The new edge to add must follow a few rules:
- it must not belong to the current tour;
- the “following edge,” $x_{i+1}$, must exist;
- the gain has to be positive.
# t2i is the tail of the last excluded edge
# gain is the current gain from previous moves
for node in T[t2i]: # This choice can be improved
yi = (t2i, node)
Gi = gain - c(yi)
if Gi > 0 and yi not in broken: # Check that x_i + 1 exists
Y.add(yi)
# Stop at the first improving tour
return chooseX(T, t1, node, Gi, X, Y)
# No improving nodes, keep looking
return False
Improvements
From this starting point, there are a number of optimisations offered in the seminal paper, though some were rebuked later on. I will list those I implemented so far. Most of these improvements are heuristics to speed-up the search at the smallest cost with regards to optimality as possible.
Solution Removal
We immediately stop searching if we find a tour that we found at a previous iteration. The rationale is simple: considering we are doing $k$-opt moves, if we find an identical solution, it means the solution is currently optimal in so far as we are concerned.
Allow Disjoint Tour
At the very beginning of the algorithm, we are looking for improving the tour as fast as possible. Hence, we relax the condition that stipulates that we can relink the tour for 2-opt moves.
Order Neighbours
We looking for candidates to add to the tour, one crucial decision is which node to try to form a new edge. One way to make this search better is to order the neighbours. My first implementation uses the standard distance between nodes, the second refines it to the gain when selecting the next edge to exclude.
Limit neighbours
A heuristic proposed in the original LK paper is to only consider the five nearest neighbours. This allows to efficiently speed-up the search as, during deep searches, considering all neighbours will be expensive.
Special case
When choosing $x_4$, order the choice based on the length of the edge. (We have two choices, select the longest edge.) Also do not check if it forms a tour as we want to allow double bridge moves.
Results
I have to say, I am really impressed with the efficiency of this heuristic. It took some time to wrap my head around it, but I definitely think it is worth the investment if you need good TSP solutions fast.
I will compare LK with the local TSP heuristics presented before using the same test setup: att48
the 48 state capitals of continental U.S. and a280
a drilling problem with 280 holes.
U.S. test
Here, we can see a very neat tour of the country with no apparent loop, as opposed to earlier results. (Note: the gap may come from an incorrect distance function.)
att48 | Average | Gap (%) | Time (s) |
---|---|---|---|
Greedy | 12861 | 21.01 | 0.80ms |
2-opt* | 11755 | 10.60 | 0.19 |
2-opt | 11826 | 11.28 | 0.15 |
3-opt* | 11597 | 9.11 | 3.59 |
LK | 10787 | 1.50 | 1.36 |
Drilling test
Not only do the previous results look good in terms of performance, they were also obtained extremely quickly; it took less than 4min for LK to find the best results of all other heuristics.
Lin-Kernighan managed to outperform all other forms of local search in terms of performances while staying very competitive time-wise.
a280 | Average | Gap (%) | Time (s) |
---|---|---|---|
Greedy | 3148.11 | 22.07 | 0.02 |
2-opt* | 3072.24 | 19.13 | 43.81 |
2-opt | 2874.38 | 11.45 | 26.84 |
3-opt* | 2794.51 | 8.36 | 2343.79 |
LK | 2642.55 | 2.46 | 210.05 |
Interestingly, LK is supposed to be an $O(n^{2.2})$ algorithm, so I expected timings closer to 2-opt. It makes sense though when considering an exponential scale that this small factor would impact the timings that much… Or my implementation can be improved, especially considering how seldom I hit optimal solutions.
Further Improvements
Although I am content with having a basic LK implementation right now, there are still a few improvements that I can add to the algorithm, I am just unsure how much work they will require.
Improved Neighbourhood
When looking for edges to add, we do a bit of work to identify interesting neighbours. Some more work could be done. I sort neighbours by their distance, however this is not efficient enough, it would be better to know their distance in terms of position in good solutions for the TSP.
One way to do this is to start with a one-tree5 and rank neighbours accordingly. Helsgaun offers an iterative algorithm to build a better neighbourhood for nodes.
Better Tour Structure
As described by Helsgaun, the tour structure in central to the performance of the algorithm, it needs to be able to perform the following operation efficiently: verify if adding/removing an edge still forms a tour.
One structure that fulfills this purpose – and was actually developed for it – is called ejection chain (Glover, 1992). The idea is to have a node “at the top,” linked to a circuit. Adding or removing edge has to go through that one node, making checking the correctness of the tour a simple step.
One of the advantages of using such a structure should come in when we verify that “$x_{i+1}$ exists” when choosing $y_i$. One observation in LK is that, for a given edge to exclude, there exists only one choice of node that belongs to the tour which allows to close it with $t_1$. Hence, we would not have to do this check in chooseX()
.
Starting Tour
Helsgaun offers a construction heuristic amenable to be used with LK, putting better moves in reach. As I mention in the previous post, the starting tour can dramatically change the performance of the heuristics.
Fast Scheme
Rather, I should say “slow” scheme. Currently, I follow the implementation of LK and it resets every time an improving move is identified. We could have a slower version that exhausts moves – at least for the 2-opt.
Neighbourhood Depth
Similarly, LK limits the size of neighbourhoods in a pretty arbitrary fashion. It would be interesting to have variable size neighbourhoods, maybe as part of the “slow” scheme.
Conclusion
I hope I managed to dispel some fears, misconceptions, and accessibility issues revolving around LK; I sure had some when I started looking at it. As usual, comments and corrections welcome.
References
- Lin, S., & Kernighan, B. W. (1973). An effective heuristic algorithm for the traveling-salesman problem. Operations Research, 21(2), 498–516.
- Helsgaun, K. (1998). An effective implementation of the Lin-Kernighan traveling salesman heuristic. Writings on Computer Science, 81.
- Helsgaun, K. (2000). An effective implementation of the Lin–Kernighan traveling salesman heuristic. European Journal of Operational Research, 126(1), 106–130.
- Helsgaun, K. (2009). General k-opt submoves for the Lin–Kernighan TSP heuristic. Mathematical Programming Computation, 1(2), 119–163.
- Glover, F. (1992). New ejection chain and alternating path methods for traveling salesman problems. In Computer science and operations research (pp. 491–509). Elsevier.
Footnotes
Although there seems to be debate on what is called 2.5-opt, it is sometimes a specific version of 2-opt or an improved version of 3-opt. ↩
Helsgaun lists in his report around “forty references to LK, with only three achieving the same performances.” ↩
As well as an accessible API, the “best” available being calling Concorde one way or the other. ↩
Not in the same sense as when you mention “sequential 2-opt,” here we only talk about the relation between edges. ↩
: A one-tree is an approximation algorithm for the TSP where we build a minimum spanning tree, which is an $O(n)$ algorithm, on all nodes but the depot, then relink the depot to its two closest neighbours. ↩