Tuesday, March 22, 2016

Safe Cache Invalidation

Caches - as fragile as bubbles.
There are only two hard things in Computer Science: cache invalidation and naming things.

– Phil Karlton
And right he is. Both is true for the package that I would like to present in this post. Based on functools.lru_cache, it allows you to specify when the caches should be invalided. In the absence of a proper name for this kind of functionality, I called it xcache, analogous to xheap and xfork.

You can find the source at github and the pre-built package on PyPI.

What's in the package?

The purpose of xcache can be explained best using an example. Imagine you have a function like the following:

@lru_cache
def math_func(a, b, c):
    return ....

Let's assume this is one of those proper mathematical functions you know from school or something built upon those. That means, beside all parameters being one letters, the same input yields the same output, now and forever. It further means that an LRU cache can be used to speed up this function enormously without compromising readability.

But, since there is an eternal battle going on between mathematicians and computer scientists, you don't write all your functions in this manner. Even certainly, most of the functions you've written and you are about to write will have side-effects which will inevitably lead to wrong results the longer you cache those.

In short, the associated RLU caches should be invalidated once in a while to maintain the proper output of not-so-mathematical functions. This is where xcache comes in. It allows to invalidate caches in two ways:
  1. using automatic memory management (aka garbage collection)
  2. using context managers (aka with or @)
The following examples illustrate the use-cases by attaching RLU caches to the lifespan of a Web request. Normally each request is handled within its own transaction, so most of the data used can be considered constant while handling the request. As soon as the request is finished, the transaction ends (committed or rollback); thus the caches should be invalidated since another concurrently executed request might have changed the underlying data.

Invalidation via Memory Management

some preparation

from xcache import cached_gen

request_cache = cached_gen(lambda: request) # create new cache wrapper

@request_cache()
def check_permission(user, obj):
    return ...

invalidation happens magically

request = ..... # where ever you get your request from

objs = .... # list of some objects
if any(check_permission(request.user, obj) for obj in objs):
    print(result_success)
else:
    print(result_deny)

request = .... # another request; all request caches are invalidated

NOTE: we generally attach the request object to some thread-local object, so ref_cache_gen can access it regardless of context.

Invalidation via Context Manager

preparation again

from xcache import cached

@cached()
def check_permissions(user, obj):
    return ...

explicit invalidation

from xcache import clean_caches

for request in request_list:
    with clean_caches():        # start with empty caches
        objs = .... # list of some objects
        if any(check_permission(request.user, obj) for obj in objs):
            print(result_success)
        else:
            print(result_deny)  # after this line caches are empty as well

NOTE: using clean_caches you can even specify to which object caches should be attached to.

Conclusion

RLU caches are very useful as is cache invalidation. Thus, you might find xcache to be a low-overhead addition to your caching libs. Check out the docs for more options and use-cases. You can plug in all rlu_cache-compatible cache implementations into xcache, cf. cachetools.

Best,
Sven

2 comments:

  1. This could be used in the duplicate "object fetch from db" thing in has_perm and view...

    ReplyDelete
    Replies
    1. It seems to be reasonable to attach the caches to either the request or the view object. We need to think about that possibility.

      Delete