As part of my current project, I needed a Python implementation of heuristics for the TSP. This post will be the first part about the journey of implementing these lovely algorithms. Part II will deal with Lin-Kernighan. Code is available here.


The Traveling Salesman Problem (TSP) is the most famous combinatorial optimisation problem. It apparent simplicity, finding the shourtest tour visiting a set of nodes, belies its computational complexity. Humans, by visualisation, can achieve decent solution on small instances (~10% of the optimum) but will hardly ever find the optimum; computers will have trouble finding decent solutions without dedicated algorithms, the search space growing in a factorial fashion.1

The TSP has received a lot of attention from the scientific community: efficient heuristic procedures, e.g. Lin-Kernighan-Helsgaun (Helsgaun, 2000); metaheuristics, e.g. Ant Colony optimisation (Dorigo & Gambardella, 1997); or exact solution algorithms, e.g. Branch-and-Cut (Padberg & Rinaldi, 1991).

I decided to make this a two-part series of post about implementing TSP heuristics because LK deserves its own post.

Local TSP Operators

Let me start by introducing the classic $\lambda$-opt TSP operator. The idea behind these operators is that a solution tour for a TSP instance can be made better by swapping some of its edges if the new edges provide a reduction on the length of the tour.

It is easy to see that an optimal tour for a TSP with $n$ cities has to be $n$-optimal. But the complexity of the method grows with the size of the operator: 2-opt is an $O(n^2)$, 3-opt is an $O(n^3)$, etc.; thus making larger values of $\lambda$ unusable.


I will use two examples from the classic TSPLib: att48,2 the TSP of the 48 state capitals of continental U.S.; and a280 a drilling problem with 280 holes.

The descriptions in this post will use (Python) pseudo-code. A tour is a sequence of nodes representing the order of visits. For implementation details, please refer to the code.3

I will use the following notation:

  • $c(\cdot)$ is the cost of an edge or a tour;
  • $G[i]$ represents the neighbours of $i$ in the graph $G$;
  • $T[i]$ represents the neighbours of $i$ in the tour $T$, so its predecessor and successor;
  • an edge is composed of a head and a tail: $(t_i, t_j)$.


The basic operator would be the 1-opt; for every node, it will select its closest neighbour until all nodes have been visited, then relink with the depot (the starting node). It is also called “nearest neighbour (NN).”

This algorithm is obviously not efficient as it does not value the last relinking step at all and may end up in a local solution with a very long edge to go back to the depot.



node = 0
visited = set()

while len(visited) < len(nodes):

    # Find the closest, non-visited neighbour
    next = find_closest(G[i], visited)
    node = i



We can see here that the algorithm does not perform well, it creates a loop around Illinois, and the last steps are really long (Seattle to Phoenix to Tallahassee).



The most well-known, and widely used TSP local operator is for sure 2-opt. Here, we want to select two edges that, if they were swapped, would produce a shorter tour.

The idea is that some edges might cross-over and one property of an optimal TSP solution is that no two edges should cross. However, graph planarity is one of these hard problems for computers, it is therefore less expensive to try to swap edges in order to reduce the length of tour.

A 2-opt move consists in finding a pair of nodes ($i$ and $j$) for which changing their outgoing edge with an new one will reduce the cost of the tour. In other words, we replace: $(i, i+1)$ with $(i, j)$ and $(j, j+1)$ with $(i+1, j+1)$. The gain offered by such a move is easily calculated as:

\[ g = c(i, j) + c(i+1, j+1) - c(i, i+1) - c(j, j+1) \]

If the gain is positive, we have an improving move.

To execute the move we simply need to keep the tour intact until node $i$, add its new neighbour (tail of the chosen edge), append the tour between $i+1$ and $j$ in reverse order, then finish with the tail of $j$ and the rest of the original tour. This is called a “swap.”

The 2-opt is an amenable optimisation method because it yields decent results at a very small implementation cost: all 2-opt moves are feasible as long as we consider a complete graph; edge selection does not require any heuristic; the swap operation is straightforward on a tour; determining if a move will improve the tour is a simple cost check.



while improved:
    best = c(tour) # start with an initial tour
    size = len(tour)
    improved = False

    for i in tour[0:size-3]:
        #  i+2 because i+1 will be the tail of the edge
        for j in tour[i+2:size]:
            # Calculate gain
            gain = c(i, j) + c(i+1, j+1) - c(i, i+1) - c(j, j+1)

            if gain > 0: best -= gain
                # i is the last element in place
                tour = swap(tour, i + 1, j)
                improved = True
                break # return to while



Here, the 2-opt manages to avoid the initial loop we saw in Greedy, but still has a fairly long relink at the end (Cheyenne to Montgomery).



The 3-opt heuristic is a logical extension of the 2-opt: instead of relinking two nodes, we will relink three because some cases cannot be optimised by the 2-opt algorithm.4

Now that we have three edges, we have to determine which way we want to re-arrange them while retaining a tour: we obtain seven different permutations. Which means that for every three nodes in the tour, we have to do seven operations.

The following figure presents the seven possible exchanges. The first figure is the original tour, the next three are 2-opt moves, and the second row contains the four actual 3-opt moves.

On thing to note: it is crucial for the performance of the algorithm to exhaust 2-opt moves before trying on 3-opt in the case where we select the first improving move.



while improved:
    best = c(tour) # start with an initial tour
    size = len(tour)
    improved = False

    for i in tour[0:size-5]:
        for j in tour[i+2:size-3]:
            for k in tour[j+2:size-1]:
                # Have to check 7 (sic) permutations
                for ex in range(7):
                    path, gain = exchange(tour, ex, i, j, k)

                    if gain > 0:
                        best -= gain
                        tour = path
                        improved = True
                        break # return to while

Where exchange() is the function to swap i, j, k using one of the seven combinations: the first three are 2-opt moves, the last four 3-opt moves. I will let the interested reader refer to the code as it is but an extension of 2-opt moves. Do note that it can be implemented as a sequence of swap().



In this example, 3-opt performs very similarly to 2-opt.



I hope that by now you have an intuition on how $k$-opt moves function: select up to $k$ nodes in a tour and find $k$ better edges to form a better tour starting from them. However, once we increase past 3-opt we start having possible non-sequential exchanges, that is moves we cannot express in terms of 2-opt moves.


Few, if any, papers reference moves beyond 5-opt.

Any $k$-opt algorithm also leads to a $l$-optimal solution (with $l < k$), which means that exploring larger values of $k$ will always lead to better, or at least equivalent, tours. The issue is that we need to try out larger values of $k$ to find the best improvements first for fear of losing them to an $l$-opt.

For a tour of size $n$ to be optimal, we have to prove that it is $n$-opt – that is, there does not exist a permutation that leads to a better tour. However, due to the incresing complexity of the algorithm, this is not practical.

The difficulty of $k$-opt is therefore to determine which $k$ to use to make the tour better. Most algorithms use an adaptive approach: starting from a given tour, make it 2-opt; then try to make this result 3-opt; etc.

In the next post I will introduce the Lin-Kernighan heuristic for the TSP which solves this issue using the following idea: every time we find a promising move using 2-opt, we try to extend it to 3-opt by finding another edge to exclude, then to 4-opt, etc.


One of the issue with local heuristics is their high variance. The starting tour influence greatly their runtime and performance by providing better moves earlier, if at all. Below is a small sample of results on both att48 and a280.5 Names with a ‘*’ mean that the fast scheme6 was used.

Results on att48 were run 100 times and the average reported. We can easily see how much longer 3-opt takes compared to 2-opt, but that having a fast scheme allows it to be competitive.

I have to mention here that, even with 100 runs, the results are still fluctuating quite a lot, I will try to have a more extensive test setup at some point. The execution times are fairly stable though.

att48AverageGap (%)Time (s)

An interesting observation with 20 runs on a280 is the confirmation that, for 2-opt, the running time is not improved when having a fast scheme. This can be explained as the move selection phase runs in linear time, which means that we only shut down one “layer” of the optimisation, as opposed to 3-opt where we shut down two – in effect, reducing the complexity by one order of magnitude.

(And, yes, fast 3-opt takes almost 40min to complete.)

a280AverageGap (%)Time (s)


  1. Helsgaun, K. (2000). An effective implementation of the Lin–Kernighan traveling salesman heuristic. European Journal of Operational Research, 126(1), 106–130.
  2. Dorigo, M., & Gambardella, L. M. (1997). Ant colonies for the travelling salesman problem. Biosystems, 43(2), 73–81.
  3. Padberg, M., & Rinaldi, G. (1991). A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM Review, 33(1), 60–100.


  1. E.g., a TSP with 10 cities has 9! = 362,880 possible solutions, one with 11 cities has over three million. 

  2. For anyone interested, pray have a look at the code to convert the results from att48 to a map, the basic idea is that the entries in the file are ordered alphabetically by state name. 

  3. All algorithms described in this post have two possible implementations: one stops at the first improving move, the other explores all possible moves for a configuration and selects the best. I chose to use the latter so far. I plan to extend the code to have the two options available. 

  4. Do note, however, that any 3-opt move can be expressed as a sequence of, at most, two 2-opt moves.

    The difference here is that the 2-opt algorithm finds one 2-opt move and execute it, while the 3-opt algorithm, in a way, finds up to two 2-opt move

  5. Results for a280 were produced on an HPC cluster, too long for my poor computer. 

  6. Restart at the first improving move instead of exploring all possible moves and selecting the best.