The Bin Packing problem (BP) is a classic optimisation problem. It has a lot of similarities with the Knapsack problem as the goal is to pack items into containers. However, the BP’s objective is to minimise the number of bins to open given a set of items to allocate.

What Is It?

First, I will present the mixed-integer programming formulation of the problem, the first line is the objective: minimise the number of opened bins; the first constraint means that every item needs to be allocated to a bin; the second constraint means that a bin cannot accept more than a give size. maxiNyis.t.iNxij=1jMjMajxijVyiiNx,yB\begin{align} \max && \sum_{i \in N} y_i \tag{BP} \\ s.t. && \sum_{i \in N} x_{ij} & = 1 & \forall j \in M \\ && \sum_{j \in M} a_j \cdot x_{ij} & \leq V \cdot y_i & \forall i \in N \\ && x, y \in \mathbb{B} \nonumber \end{align}

It was the subject of heuristic research for long years because it admits some simple algorithms to solve it.


The first-fit was the first heuristic found for BP, it simply consists in allocating an item in the first bin available. It runs in linear time based on the number of items, and has for property to never be worse than twice the optimal solution. Thus we call it an Approximation Algorithm (AA) with a ratio of: 2.

If we consider $\textbf{opt}$ the optimal solution to a BP, then if $ff(\cdot)$ is the function giving us the value returned by first-fit, we have, for an instance $X$: $ff(X) \leq 2 \times \textbf{opt}$.

First-Fit Decreasing

A simple optimisation of first-fit is to first sort the items by decreasing sizes, as such the approximation ratio is now: $11 / 9 \textbf{ opt} + 1$.


To go further in the realm of AAs, we can look at algorithms with guarantees on their runtime. There are two main class of such algorithms:

  1. Fully polynomial time (FPTAS), which means that the algorithm will always executes in time polynomial in the size of the instance (plus $\varepsilon$ as factor).1
  2. Asymptotically polynomial time (APTAS), in this case the guarantees on the runtime only hold if the size of the instance is infinity.

In this post I am looking at the second category, the APTAS for BP (De La Vega & Lueker, 1981); let us hope it works well quickly.

The algorithm

The idea behind this algorithm is to isolate certain properties of the problem (here, bundle items by size range and discard the smallest ones). Then use dynamic programming to compute an exact solution to the relaxed problem. Finally, return to the original problem.

The APTAS for BP works as follows:

  1. Remove small items – of size < $\varepsilon$. This gives set $I$ for small items and set $J$ for the rest.
  2. From $J$ we will create a new set containing $k$2 item sets. Then set the size of items within each set to the maximum in this set, obtaining set $J’$.
  3. Find optimal packing of $J’$ using dynamic programming.
  4. Replace items in the packing with their actual sizes, taken from $J$.
  5. Pack small items using first-fit, “on top” of the packing.

The important step here is #3, so I will talk about it a bit longer. The dynamic programming approach is based on searching for patterns that fit into increasing number of bins.

  1. Create all possible packing that fit into a single bin, this gives pattern: $P_1$.
  2. To compute $P_{n+1}$, merge the previous pattern with the original pattern: Pn+1=Pn×P1P_{n+1} = P_n \times P_13
  3. Finish if we can fit all items in $n$ bins.

The idea is that, at each step you increase the number of bins by one by generating new patterns that fit into a bin anyway.


Given a bin of size 1, and J={{.2,.3},{.4,.5},{.7}}J={{.3,.3},{.5,.5},{.7}}\begin{align} J & = \{\{.2, .3\}, \{.4, .5\}, \{.7\}\} \\ J' & = \{\{.3, .3\}, \{.5, .5\}, \{.7\}\} \end{align}

this is how the DP will find the optimal packing. We first determine the set of patterns that fit into a single bin. Then, to create additional patterns, we simply extend the number of bins available to fit patterns, which would go as follows.

1[{.3}], [{.3, .3}], [{.3, .5}], [{.3, .7}], [{.5}], [{.5}, {.5}], [{.7}]
2[{.3}, {.3}], [{.3}, {.3, .3}], [{.3}, {.3, .5}], …
3[{.3}, {.3}, {.3}], …, [{.3, .5}, {.5}, {.3, .7}]

At iteration 3 we actually find patterns that has as many items as $J’$, thus we can stop.


Okay, so now we have a working implementation, but there is a “but.” It is slow. As in, badly lagging for anything larger than ten items. So let us start debugging the code to find the bottlenecks.

To do so, I will use the Python profiler from the command line. It produces either results in the terminal or in a file format. The file format seems the most convenient, however it is a binary file, so unfit to be read directly within an editor.

Because we are using Python, there is a package for that: cprofilev. This tool generates a web page given a cProfile output file. Perfect.

I will focus on few statistics which I consider the most important:

  • cumtime: the running time of a given function, in a way its actual runtime.
  • tottime: the total running time of a function, that is the sum of the cumtime of the functions called within this one.
  • ncalls: the number of calls to a function, useful for finding extra object creation and the like.


To allow reproducibility because variance seems quite large I will fix the random seed, so every run will get the exact same set of bins to optimise.4 I randomly generate 10 items with numpy, with weights between 1 and 10, then normalise before running the APTAS.

  • Python 3.5
  • seed: 100
  • $\varepsilon = 0.1$
  • $N = 10$

The results of running required quite a bit of computation:

13 configurations with 1 bins.
111 configurations with 2 bins.
687 configurations with 3 bins.
3330 configurations with 4 bins.
12810 configurations with 5 bins.
37710 configurations with 6 bins.
Feasible in 7 bins.

First Run

And this is the header of the profile:

35169689 function calls (32064478 primitive calls) in 22.659 seconds

Yep, packing ten objects in six bins has a runtime of over twenty seconds. But then again, the runtime of this algorithm is in polynomial time as it does not depend on the size on the input, only on the number of different sizes in $J’$ and $\epsilon$ which are fixed parameter. However, once these constants are fixed we still have a huge increase in the solving time.

Here are the first few lines of the output, sorted by tottime:5

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 221648    6.208    0.000   17.682    0.000 .../
1329888    2.103    0.000    2.103    0.000 {method '__deepcopy__' of 'numpy.generic' objects}
1994832    1.636    0.000    2.107    0.000 .../
 221648    1.343    0.000    2.391    0.000 .../collections/
8496613    1.239    0.000    1.239    0.000 {method 'get' of 'dict' objects}
 221648    1.235    0.000   10.988    0.000 .../
 443297    1.169    0.000    2.573    0.000 .../collections/
      1    1.015    1.015   22.572   22.572

You have to go to line 8 for the first mention of a function I implemented, and the grand winner of this race is: deepcopy, which was called over two-hundred thousand times and accounts for about 27% of the total time. If you look at the cumulative time, which is the time taken by functions called by it, it skyrockets to 78%!


The good thing is: I do not have to look for the culprit; the bad thing is: I hope I do not have to deal with performances in the standard library.

Let us explore further, where do we call deepcopy so much? We can click on the function to get a breakdown of its callstack (sort of). We are mainly interested in the “Called By” section.

Function                           was called by...
                                       ncalls  tottime  cumtime
.../          <-  221648    0.357   11.587  .../<listcomp>)
                                      2659776    4.634    9.712  .../
                                       221648    0.413   13.100  .../
                                       221648    0.804   17.682
{numpy __deepcopy__}               <- 1329888    2.103    2.103  .../
.../    <-  221648    1.235   10.988  .../
.../   <-  221648    0.651   12.419  .../
.../  <- 1329888    0.158    0.158  .../

Okay, so everything in there are standard library calls but one, easy to figure out which line to look for – I already knew, but you may not.

84  # Available items
85  current = deepcopy(avail)

Alright, let us de-tangle this a little bit, first here is the definition of avail:

63  avail = Counter(items)

I use avail to count the different object sizes available to pack – remember the definition of the APTAS, $J’$ is a set of set of objects where each has the same size within one set.

Thus, when creating a new potential bin, I verify that I have enough “sizes” available; if not, I ditch the configuration.


Okay, at this point it seems pretty clear that dictionary deep copying is an extremely inefficient operation. Maybe some of you are expert Pythonist, but I am not. And I have a deep, ingrained distrust for how references and values are handled.6

Thus, I tend to deep copy everything, otherwise I spend my time banging my head about why I have inconsistent data between two iterations of the same function. This time, let us give copy a chance.

First thing first: the results are the same, no corruption. Second, well, it does seem a tad more efficient:

7020374 function calls (7018235 primitive calls) in 6.326 seconds

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
221648    1.264    0.000    2.177    0.000 .../collections/
443297    1.008    0.000    2.328    0.000 .../collections/
     1    0.764    0.764    6.178    6.178
443297    0.633    0.000    1.282    0.000 .../collections/
221648    0.539    0.000    1.285    0.000 .../collections/

As in, by a factor three—almost. I am not going to complain about that,7 this looks far more reasonable. Now the bulk of my operations seems to come from subtracting, which should be the counter.

Final Optimisations


At this point, the algorithm does work better but is still fairly slow. The curse of complexity is doubly efficient when you have inefficient operations, and even worse if they could be omitted. The point here is: the way we store configurations allows for one more optimisation.

We store lists of bins coming from the cartesian product of previous bins configuration with the single-bin configurations. Therefore, we quite often end-up with similar configurations multiple times. We can do a simple check to skip configurations that already exist.8


One thing that goes with checking for duplicates is to find a way to avoid symmetric solutions. That is, configurations with the same set of bins but in different order.

X = [[0.9], [0.8]]
Y = [[0.8], [0.9]]

Both these configurations are equivalent but considered different. When doing the product of possible bins for the next size, they will both appear though they do not provide actual alternatives.


The results are finally up to the standard I would expect considering the algorithm. I will omit the callstack as it is mostly library calls.

289812 function calls (287673 primitive calls) in 0.328 seconds

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  7614    0.043    0.000    0.074    0.000 .../collections/
     1    0.040    0.040    0.242    0.242
 15229    0.036    0.000    0.083    0.000 .../collections/
 15229    0.022    0.000    0.046    0.000 .../collections/
  7614    0.019    0.000    0.045    0.000 .../collections/

That is an improvement by a factor of almost twenty! Reducing the number of configurations to handle was the obvious way to make the algorithm efficient, but needed a bit of tweaking first.

13 configurations with 1 bin.
57 configurations with 2 bins.
130 configurations with 3 bins.
191 configurations with 4 bins.
189 configurations with 5 bins.
126 configurations with 6 bins
Feasible in 7 bins.


As with most optimisation algorithms, the devil is in the details. The APTAS for bin-packing is still not a very efficient heuristic at the end of the day, its only advantage being to provide a guarantee on its results.

As a side note, I did try to implement this algorithm using numpy but the results were atrocious. Not so much that we have to convert the sizes, but that checking for equality between two numpy arrays is a very inefficient operation – used when checking for duplicates. But I may be wrong.

One last thing I may try is to convert the list of objects to a vector of numbers (as I did in numpy) as it would reduce the input size further.


  1. De La Vega, W. F., & Lueker, G. S. (1981). Bin packing can be solved within 1+ε in linear time. Combinatorica, 1(4), 349–355.


  1. Be careful, polynomial here can mean any power of the size, e.g. the textbook FPTAS for knapsack is in $\mathcal{O}(n^3)$. 

  2. $k = \lceil 1 / \varepsilon^2 \rceil$ 

  3. The $\times$ here means cartesian product, so, yes, the number of feasible patterns will grow really quickly. 

  4. I was lucky and quickly found a seed that gave awful performances. 

  5. For readability, I will replace the Python path in library calls with “...” 

  6. Do not get me started on how, in my opinion, Python breaks the developer’s trust. 

  7. Though I will have to go dig into some of my older code and see if I cannot get the same kind of improvements. 

  8. Note: doing this only actually slows down the algorithm as very few are actually removed and checking takes a long time.