Tuesday, February 16, 2016

The xheap Benchmark

These are the inlets of a steam engine. That means, it's time to perform some serious measurements!

We are going to compare xheap and heapq. The benchmark suite can be found right by the source.

The Competitors
  • heapq - collections of heap functions of Python stdlib written in C
  • xheap - object-oriented wrappers for heapq

The Benchmark of Runtime

In order to make things comparable, I split the benchmark up into two benchmark cases. Both cases differ by the fact that customizing the ordering of a heap always requires more work. The second benchmark takes that into account.
Both cases run heapify, pop and push 10,000 times and calculate the minimum of these runtimes (in milliseconds) which you see presented below. Depending on which item is popped or pushed, runtimes can vary. Because of that, we start with a heap of size X, then popping or pushing X/32 items.
Each case further has a baseline which is heapq obviously. I used timeit at suggested here; the benchmark ran on Python 3.4.

1) heapq vs Heap vs RemovalHeap


operation         1,000 items    10,000 items   100,000 items   1,000,000 items

init heapq        0.03 ( 1.00x)  0.41 ( 1.00x)   4.50 ( 1.00x)   64.83 ( 1.00x)
     Heap         0.03 ( 1.02x)  0.42 ( 1.02x)   4.47 ( 0.99x)   64.70 ( 1.00x)
     RemovalHeap  0.11 ( 3.35x)  1.32 ( 3.23x)  17.88 ( 3.98x)  222.94 ( 3.44x)

pop  heapq        0.01 ( 1.00x)  0.07 ( 1.00x)   0.86 ( 1.00x)   10.05 ( 1.00x)
     Heap         0.01 ( 1.45x)  0.09 ( 1.33x)   1.08 ( 1.26x)   12.73 ( 1.27x)
     RemovalHeap  0.27 (51.73x)  3.68 (52.28x)  44.35 (51.80x)  517.98 (51.54x)

push heapq        0.00 ( 1.00x)  0.03 ( 1.00x)   0.34 ( 1.00x)    3.65 ( 1.00x)
     Heap         0.01 ( 1.81x)  0.06 ( 1.83x)   0.61 ( 1.82x)    6.46 ( 1.77x)
     RemovalHeap  0.08 (23.66x)  0.66 (19.65x)   6.48 (19.20x)   67.46 (18.46x)

As expected the bare C implementation outperforms any wrapper lib written in Python. However, the performance penalty is quite small compared to what we gain through cleaner code and better maintainability when using Heap.

The current implementation of RemovalHeap definitely runs slower as it needs to keep track of changing indexes. Perhaps, that is a good reason to rethink its implementation and switch to a mark-and-sweep approach. So, this benchmark is a good starting point for the future. Additionally, this shows that if you don't really need removal, you better stick to Heap.

2) heapq + tuples vs OrderHeap vs XHeap


operation       1,000 items    10,000 items   100,000 items   1,000,000 items

init heapq      0.14 ( 1.00x)  1.60 ( 1.00x)  24.16 ( 1.00x)  272.64 ( 1.00x)
     OrderHeap  0.17 ( 1.26x)  1.90 ( 1.19x)  27.20 ( 1.13x)  302.80 ( 1.11x)
     XHeap      0.29 ( 2.16x)  3.18 ( 1.99x)  50.32 ( 2.08x)  553.61 ( 2.03x)

pop  heapq      0.01 ( 1.00x)  0.15 ( 1.00x)   1.82 ( 1.00x)   21.85 ( 1.00x)
     OrderHeap  0.02 ( 1.75x)  0.23 ( 1.58x)   2.72 ( 1.49x)   30.59 ( 1.40x)
     XHeap      0.28 (25.25x)  3.84 (25.98x)  46.32 (25.45x)  540.32 (24.73x)

push heapq      0.00 ( 1.00x)  0.04 ( 1.00x)   0.54 ( 1.00x)    5.69 ( 1.00x)
     OrderHeap  0.01 ( 4.00x)  0.15 ( 3.68x)   1.65 ( 3.08x)   16.50 ( 2.90x)
     XHeap      0.04 ( 9.55x)  0.35 ( 8.79x)   3.85 ( 7.16x)   39.49 ( 6.94x)

Generally speaking, the overhead of OrderHeap and XHeap to provide custom orders diminishes when we hit bigger and bigger sizes. This makes sense as the overhead basically consists of creating and unpacking tuple values on the fly and super calls only once per operation; thus is constant.

The Benchmark of Removal

This particular feature cannot be compared using heapq, Heap and OrderHeap since they simply don't feature removal. For future reference and for providing guidance, I still present the benchmark results comparing RemovalHeap and XHeap to each other. As an attempt to equalize item comparison, RemovalHeap will be fed by tuples this time. RemovalHeap is the baseline.


operation        1,000 items    10,000 items   100,000 items   1,000,000 items

remove RemoHeap  0.12 ( 1.00x)  1.26 ( 1.00x)  13.28 ( 1.00x)  149.50 ( 1.00x)
       XHeap     0.12 ( 0.94x)  1.19 ( 0.95x)  12.16 ( 0.92x)  124.02 ( 0.83x)


The Benchmark of Comparisons

Regarding heapify, pop and push, xheap does not perform even a single item comparison more than heapq. As you can infer from the source, Heap and OrderHeap leave it completely to heapq for efficiency reasons.

RemovalHeap and XHeap, on the other side, have a peculiarity that require them to perform at least some of the item comparisons on their own. Though, the amount of comparisons stay the same as the source is a simple copy from heapq. Another post will cover why this is necessary.

Conclusion

As suspected in the previous post, both convenience and features come at price. So, in order to preserve your initial goals when using heaps (= speed), your best choice is the feature-poorest variant. Most of the overhead is constant per operation since it's consists of wrapper, super and descriptor calls. Given the current optimization efforts of CPython, this kind of overhead can be reduced even further without changing xheap itself.

I would like to express my gratitude to the Python community for providing heapq. Without this amazing library the development of xheap hadn't been possible.

As usual, I am open for suggestions of how to improve the benchmark. To see how xheap performs on your machine, you can simply execute test_xheap_time.py from the xheap repository.
A beautiful steam engine running at full speed. You can read more about it here.
Best,
Sven

No comments:

Post a Comment