summaryrefslogtreecommitdiff
path: root/artima
diff options
context:
space:
mode:
authormichele.simionato <devnull@localhost>2009-11-19 08:38:45 +0000
committermichele.simionato <devnull@localhost>2009-11-19 08:38:45 +0000
commitfd9a3fe72088d4f5dcd4a9348e12a57a91ed3e5f (patch)
tree16301362bdbeb49ac20f02c3d6320d71bcf2893f /artima
parentc8ea2b3eda09778a6250d4ef6c368fa08620b189 (diff)
downloadmicheles-fd9a3fe72088d4f5dcd4a9348e12a57a91ed3e5f.tar.gz
Added a little paper about caches
Diffstat (limited to 'artima')
-rw-r--r--artima/python/Makefile3
-rw-r--r--artima/python/caches.py148
2 files changed, 151 insertions, 0 deletions
diff --git a/artima/python/Makefile b/artima/python/Makefile
index dfcfcba..afada1e 100644
--- a/artima/python/Makefile
+++ b/artima/python/Makefile
@@ -56,3 +56,6 @@ mixins4: mixins4.py
traits: traits.txt
$(POST) traits.txt 246488
+
+caches: caches.py
+ $(MINIDOC) -d caches; $(POST) /tmp/caches.rst 274438
diff --git a/artima/python/caches.py b/artima/python/caches.py
new file mode 100644
index 0000000..38b3244
--- /dev/null
+++ b/artima/python/caches.py
@@ -0,0 +1,148 @@
+"""
+I work as an enterprise programmer and every day I have to face
+problems typical of the enterprise context. Here is an example
+from a ticket I got assigned yesterday. As everybody does, we employ
+caches in various places in our code base for performance reasons. As
+everybody does, we have problems with them. Let me explain the specific
+issue in the ticket. We have literally a dozen or more of different kind
+of caches: in somes place we use simple dictionaries, in other places
+we use slightly smarter data structures and in some places we use a wrapper
+over memcached. The caches have been implemented by different
+programmers during a period of years; there are caches at the Python
+level and caches at the C++ level. The caches are cleared in a few
+spots, and in particular there is code like this::
+
+ if number_objects_allocated > MAX_OBJECTS:
+ cpp_cache.reset()
+ python_cache.clear()
+ the_other_python_cache.reset()
+ the_other_other_python_cache.flush()
+ ... etc etc
+
+Actually it is more complicated than that. Notice that the caches do
+not share a common interface: sometimes ``.clear`` is called
+``.reset``, sometimes ``.flush``, sometimes ``.flush_all``, etc. The
+usual things that happens when you have different programmers working
+in different groups in different times, deadlines and lack of code
+review. The problem is that nobody remembers what are all the caches
+involved exactly. It has already happened twice that somebody forgot
+to clear a cache that should have been cleared, resulting in errors in
+far way portions of the code.
+
+I got assigned the task of finding which were the relevant caches (and
+this is the difficult part, since it involves spelunking through
+hundreds of thousands of lines of code and asking people who do not
+remember who wrote what) and to unify the caching mechanism, making it
+more robusts. This is the easier part, and I will talk about this part
+here. My ticket suggested to solve the problem with a metaclass
+trick, to make it sure that all cache instances were registered as
+soon as created, and to make it possible to clear all caches in a
+single step.
+The requirements made me remember an old recipe of mine, published on
+the `Python Cookbook`_ a couple of years ago, the AutoClose_ recipe.
+It was rather easy to convert the recipe to provide the needed
+functionality. The code is given below, here I will discuss how it
+works.
+
+The idea is to track down in our codebase the places where our caches
+are *defined*, which is much easier than finding the places were the
+cache are *used*. Having found a pre-existing cache class, I can add
+the tracking capability to it simply by adding a line ``__metaclass__
+= AutoClearMeta`` to it, which is very little invasive. In some places
+I also need to add an alias method: for instance if the class has a
+``reset`` method instead of a ``clear`` method I can just add the
+alias ``clear = reset``. In many places the caches are simple Python
+dictionaries: in such situations I just need to replace statements
+like ``_cache = {}`` with ``_cache = DictCache()`` where ``DictCache``
+is just an AutoClearMeta-augmented dictionary, defined in the obvious
+way:
+
+$$DictCache
+
+The advantage of doing so is that now I can reduce the clearing code to
+the following three lines::
+
+ if number_objects_allocated > MAX_OBJECTS:
+ cpp_cache.clear() # taking care of the C++ caches is not my job
+ python_cache_class.clear_all()
+
+When in the future people will add more caches, this code will stay
+unchanged, since the metaclass takes care of registering all the
+caches for us. Some care is required of course, since the programmers
+are expected to use a cache class augmented by ``AutoClearMeta``.
+However, even if they forget to do so and they use their own class,
+the fix is just one line of code.
+
+The design here is quite elegant. If you have a set of caches that should be
+cleared at the same time, the solution is to make them all instances of
+the same subclass ``S``: calling ``S.clear_all()`` will clear all of them
+in a row. Morever, if ``S1`` is a subclass of ``S``, ``S.clear_all()``
+will clear all instances of ``S1`` too, whereas ``S1.clear_all()``
+will clear only the instances of ``S1`` and its subclasses (if any).
+On the other hand, if you have classes which should *not* be cleared
+at the same time, just make them instances of a different cache class,
+since caches in different hierarchies are cleared out independently.
+
+I thought I would share this recipe, since it may be of use to somebody, and
+also because real life usages of metaclasses are so rare that
+they are worth mention. Here is the code for the AutoClearMeta metaclass:
+
+$$AutoClearMeta
+
+.. _Python Cookbook: http://code.activestate.com/recipes/langs/python/
+.. _AutoClose: http://code.activestate.com/recipes/523007
+"""
+
+class AutoClearMeta(type):
+ """
+ Metaclass that tracks the instances of its instances and clears them
+ in reverse instantiation order. Requires classes with a .clear method.
+ It is possible to clear absolutely everything with
+
+ for cls in AutoClearMeta.autoclearclasses:
+ cls.clear_all()
+ """
+ autoclearclasses = []
+
+ def __new__(mcl, name, bases, dic):
+ cls = super(AutoClearMeta, mcl).__new__(mcl, name, bases, dic)
+ cls.clear # assert the method clear exists
+ cls._instances = []
+ mcl.autoclearclasses.append(cls)
+ return cls
+
+ def __call__(cls, *args, **kw):
+ # tracks the instances of the instances
+ self = super(AutoClearMeta, cls).__call__(*args, **kw)
+ cls._instances.append(self)
+ return self
+
+ def clear_all(cls):
+ "Recursively clear all instances of cls and its subclasses"
+ for subc in cls.__subclasses__(): # direct proper subclasses
+ subc.clear_all()
+ for obj in reversed(cls._instances):
+ obj.clear()
+
+class DictCache(dict):
+ "An AutoClearMeta-augmented dictionary class"
+ __metaclass__ = AutoClearMeta
+
+if __name__ == '__main__': # test
+ import logging
+
+ class S(object):
+ __metaclass__ = AutoClearMeta
+ def clear(self):
+ logging.warn('clearing cache %s' % self)
+
+ class S1(S):
+ pass
+ class S2(S1):
+ pass
+
+ c1 = S()
+ c2 = S()
+ c3 = S1()
+ c4 = S2()
+ S.clear_all()