Thursday, February 18, 2016

Let's go down the rabbit hole!

Things can be topsy-turvy when considered upside down.
As mentioned in the previous post, there is an interesting and at the same time weird piece of code duplication in RemovalHeap and XHeap that is necessary to make them work properly. This post will cover this oddity in depth.

Imagine you want to count the number of items being set in a list. So, instead of providing a native list object, you write your own class like this:

class MyList(list):
    count = 0
    def __setitem__(self, key, value):
        self.count += 1
        super(MyList, self).__setitem__(key, value)

ml = MyList([0])
for i in range(10):
    ml[0] = -i

print(ml.count) # print 10

Sounds good, right? Now, RemovalHeap and XHeap do exactly this when they keep track of the index of an item. Instead of counting, though, they store the new index in a dictionary. This is necessary for fast item removal, i.e. runtime of O(log n).

So far so good and this could be the end of the story if that were the only necessity for fast item removal. However, it turned out it isn't.

When we now apply heappop on our counting class, it turns out it doesn't work anymore:

from heapq import heappop

ml = MyList(range(10))
heappop(ml)
print(ml.count) # print 0

No changes at all? That cannot be right.

It seems we don't jump back into Python code. You might say, this is due to the underlying C implementation of heappop but have a look. Let's copy the Python source from heappop and call it my_heappop:

def my_heappop(heap):
    from heapq import _siftup
    lastelt = heap.pop()
    if heap:
        returnitem = heap[0]
        heap[0] = lastelt
        _siftup(heap, 0)
        return returnitem
    return lastelt

As you can see, it just delegates work to _siftup and that will do the trick:

ml = MyList(range(10))
my_heappop(ml)
print(ml.count) # print 6

Now, there have been 6 changes to our heap when popping off the first item.

What's wrong here? As one can see, Python does in fact hop back and forth intertwined Python and C code. Why the public API of heapq doesn't work as expected whereas the private one does, is unclear to me. This peculiar behavior of heappop also holds for heappush, heapreplace and heappushpop. As usual I am open for suggestions and explanations.

In any case, that is the reason why RemovalHeap and XHeap duplicate some parts of heapq and now you know.

Best,
Sven

NOTE: I could reproduce this behavior for Python 2.7.6 and Python 3.4.3.

No comments:

Post a Comment