ADVANCED METAPROGRAMMING TECHNIQUES ======================================================================== In elementary OOP, the programmer works with objects; in advanced OOP, the programmer works with classes, taking full advantage of (multiple) inheritance and metaclasses. Metaprograming is the activity of building, composing and modifying classes. I will give various examples of metaprogramming techniques using run-time class modifications multiple inheritance, metaclasses, attribute descriptors and even simple functions. Moreover, I will show show metaclasses can change the semantics of Python programs: hence theire reputation of *black* magic. That is to say that the techniques explained here are dangerous! On code processing -------------------------------------------------------------------- It is a good programming practice to avoid the direct modification of source code. Nevertheless, there are situations where the ability of modifying the source code *dynamically* is invaluable. Python has the capability of 1) generating new code from scratch; 2) modifying pre-existing source code; 3) executing the newly created/modified code at run-time. The capability of creating source code and executing it *immediately* has no equivalent in static languages such as C/C++/Java and it is maybe the most poweful feature of dynamics languages such as Java/Python/Perl. This feature has been exploited to its ultimate consequences in the languages of the Lisp family, in which one can use incredibly poweful macros, which in a broad sense, are programs that write themselves In this chapter I will discuss how to implement macros in Python and I will present some of the miracles you may perform with this technique. To this aim, I will discuss various ways of manipulating Python source code, by using regular expressions and state machines. Regular expressions ------------------------------------------------------------------------- .. line-block:: *Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.* -- Jamie Zawinski Python source code is a kind of text and can manipulated with the same techniques that are used to manipulate text: 1. the trivial search and replace; 2. regular expressions; 3. state machines; 4. parser There is not very much to say about the search and replace methods: it is fast, efficient and it works. It should always be used whenever possible. However, in this chapter I will only be interested in case where one has to recur to something more sophisticated than a plain search and replace. Something that can be managed with regular expression or with something even more sophisticated than them: a state machine or even a full featured parser. I will *not* give a primer on regular expression here, since they are already well documented in the standard documentation (see Andrew's Kuchling 'Howto') as well in many books (for instance Mastering Regular Expression and 'Python in a Nutshell'). Instead, I will give various practical examples of usage. >>> import re >>> reobj=re.compile(r'x') More on metaclasses and subclassing built-in types ------------------------------------------------------------------------- Subclassing ``list`` is easy since there are no methods returning lists except the methods correspondings to the '+' and '*' operators. Subclassing ``str`` is more complicated, since one has many methods that return strings. Nevertheless, it can be done with the ``AutoWrapped`` metaclass, simply by specifying the list of the builtins to be wrapped. :: # class Str(str): __metaclass__=AutoWrapped builtinlist="""__add__ __mod__ __mul__ __rmod__ __rmul__ capitalize center expandtabs join ljust lower lstrip replace rjust rstrip strip swapcase title translate upper zfill""".split() # Here I show various tests. .. doctest >>> from oopp import Str >>> sum=Str('a')+Str('b') # check the sum >>> print sum, type(sum) ab >>> rprod=Str('a')*2 # check the right product >>> print rprod,type(rprod) aa >>> lprod=2*Str('a') # check the left product >>> print lprod,type(lprod) aa >>> r=Str('a').replace('a','b') # check replace >>> print r,type(r) b >>> r=Str('a').capitalize() # check capitalize >>> print r,type(r) A ``Str`` acts as a nice base class to built abstractions based on strings. In particular, regular expressions can be built on top of strings describing their representation (I remind that if ``x`` is a regular expression object, ``x.pattern`` is its string representation). Then, the sum of two regular expressions ``x`` and ``y`` can be defined as the sum of their string representation, ``(x+y).pattern=x.pattern+y.pattern``. Moreover, it is convenient to define the ``__or__`` method of two regular expression in such a way that ``(x | y).pattern=x.pattern+'|'+y.pattern``. All this can be achieved trough the following class: :: # class BaseRegexp(Str): builtinlist=['__radd__', '__ror__'] wraplist=['__add__','__or__'] __add__ = lambda self,other: self.pattern + other __or__ = lambda self,other: self.pattern+'|'+other def __init__ (self,regexp): "Adds to str methods the regexp methods" if '#' in regexp: reobj=re.compile(regexp,re.VERBOSE) else: reobj=re.compile(regexp) # non-verbose for attr in dir(reobj)+['pattern']: setattr(self,attr,getattr(reobj,attr)) # >>> from oopp import * >>> aob=BaseRegexp('a')|BaseRegexp('b'); print aob a|b >>> print pretty(attributes(aob)) encode = endswith = expandtabs = find = findall = finditer = index = isalnum = isalpha = isdigit = islower = isspace = istitle = isupper = join = ljust = lower = lstrip = match = pattern = ba replace = rfind = rindex = rjust = rstrip = scanner = search = split = splitlines = startswith = strip = sub = subn = swapcase = title = translate = upper = wraplist = ['__add__', '__radd__', '__or__', '__ror__'] zfill = # class Regexp(BaseRegexp): class __metaclass__(BaseRegexp.__metaclass__): def __setattr__(cls,name,value): if name==name.upper(): # all caps means regexp constant if not isinstance(value,cls): value=cls(value) value.name=name # set regexp name BaseRegexp.__metaclass__.__setattr__(cls,name,value) # basic setattr def __call__(self,name=None): "Convert the regexp to a named group with the right name" name=getattr(self,'name',name) if name is None: raise TypeError('Unnamed regular expression') return self.__class__('(?P<%s>%s)' % (name,self.pattern)) generateblocks=generateblocks # The magic of ``Regexp.__metaclass__`` allows to generate a library of regular expressions in an elegant way: :: # r=Regexp customize(r, DOTALL =r'(?s)' , # starts the DOTALL mode; must be at the beginning NAME =r'\b[a-zA-Z_]\w*', # plain Python name EXTNAME=r'\b[a-zA-Z_][\w\.]*', # Python name with or without dots DOTNAME=r'\b[a-zA-Z_]\w*\.[\w\.]*',# Python name with dots COMMENT=r"#.*?(?=\n)", # Python comment QUOTED1=r"'.+?'", # single quoted string ' QUOTED2=r'".+?"', # single quoted string " TRIPLEQ1=r"'''.+?'''", # triple quoted string ' TRIPLEQ2=r'""".+?"""' # triple quoted string " ) r.STRING=r.TRIPLEQ1|r.TRIPLEQ2|r.QUOTED1|r.QUOTED2 r.CODESEP=r.DOTALL+r.COMMENT()|r.STRING() # The trick is in the redefinition of ``__setattr__``, which magically converts all caps attributes in ``Regexp`` objects. The features of ``Regexp`` can be tested with the following code: :: # """This script looks at its own source code and extracts dotted names, i.e. names containing at least one dot, such as object.attribute or more general one, such as obj.attr.subattr.""" # Notice that dotted.names in comments and literal strings are ignored from oopp import * import __main__ text=inspect.getsource(__main__) regexp=Regexp.CODESEP| Regexp.DOTNAME() print 'Using the regular expression',regexp print "I have found the following dotted names:\n%s" % [ MO.group() for MO in regexp.finditer(text) if MO.lastgroup=='DOTNAME'] # with output: :: Using the regular expression (?s)(?P#.*?(?=\n))|(?P '''.+?'''|""".+?"""|'.+?'|".+?")|(?P[a-zA-Z_]\w*\.[\w\.]*) I have found the following dotted names: ['inspect.getsource', 'Regexp.CODESEP', 'Regexp.DOTNAME.__call__', 'MO.group', 'dotname.finditer', 'MO.lastgroup'] Now one can define a good ``CodeStr`` class with replacing features Let me consider for instance the solution to the problem discussed in chapter 4, i.e. the definition of a ``TextStr`` class able to indent and dedent blocks of text. :: # def codeprocess(code,TPO): # TPO=text processing operator code=code.replace("\\'","\x01").replace('\\"','\x02') genblock,out = Regexp.CODESEP.generateblocks(code),[] for block in genblock: out.append(TPO(block)) out.append(genblock.next()) return ''.join(out).replace("\x01","\\'").replace('\x02','\\"') def quotencode(text): return text.replace("\\'","\x01").replace('\\"','\x02') def quotdecode(text): return text.replace("\x01","\\'").replace('\x02','\\"') # Here is an example of usage: replacing 'Print' with 'print' except in comments and literal strings. :: # from oopp import codeprocess wrongcode=r''' """Code processing example: replaces 'Print' with 'print' except in comments and literal strings""" Print "This program prints \"Hello World!\"" # look at this line! ''' fixPrint=lambda s: s.replace('Print','print') validcode=codeprocess(wrongcode,fixPrint) print 'Source code:\n',validcode print 'Output:\n'; exec validcode # with output :: Source code: """Code processing example: replaces 'Print' with 'print' except in comments and literal strings""" print "Prints \"Hello World!\"" # look at this line! Output: This program prints "Hello World!" A simple state machine --------------------------------------------------------------------------- Regular expression, however powerful, are limited in scope since they cannot recognize recursive structures. For instance, they cannot parse parenthesized expression. The simplest way to parse a parenthesized expression is to use a state machine. :: (?:...) non-grouping (?P...) (?=...) look-ahead (?!...) negative (?<=...) look-behind (? Regexp(r''' .. ''') reobj2 : R" .. (? Regexp(r""" .. """) string1 : (? ''' .. ''' string2 : (? """ .. """ """ beg=0; end=1 string1[beg]=r"(? Regexp(r''' .. ''') reobj2 : R" .. (? Regexp(r""" .. """) string1 : (? ''' .. ''' string2 : (? """ .. """ """ beg={}; end={}; ls=[] for line in decl.splitlines(): mode,rest=line.split(' : ') s,r=rest.split(' -> ') beg[mode],end[mode]=s.split(' .. ') ls.append('(?P%s)' % (mode,beg[mode])) ls.append('(?P%s)' % (mode,end[mode])) beg2[mode],end2[mode]=r.split(' .. ') ls.append(beg2[mode]) ls.append(end2[mode]) delimiters='(%s)' % re.compile('|'.join(ls)) splitlist=['']+delimiters.split(source) for delim,text in splitlist: delimiters.match(delim).lastgroup Creating classes ---------------------------------------------------------------------- TODO Modifying modules ----------------------------------------------------------- Metaclasses are extremely useful since they allows to change the behaviour of the code without changing the sources. For instance, suppose you have a large library written by others that you want to enhance in some way. Typically, it is always a bad idea to modify the sources, for many reasons: + touching code written by others, you may introduce new bugs; + you may have many scripts that requires the original version of the library, not the modified one; + if you change the sources and then you buy the new version of the library, you have to change the sources again! The solution is to enhance the proprierties of the library at run time, when the module is imported, by using metaclasses. To show a concrete example, let me consider the case of the module *commands* in the Standard Library. This module is Unix-specific, and cannot be used under Windows. It would be nice to have a metaclass able to enhance the module in such a way that when it is invoked on a Windows platform, Windows specific replacement of the Unix functions provided in the module are used. However, for sake of brevity, I will only give a metaclasses that display a nice message in the case we are in a Window platform, without raising an error (one could easily implement such a behaviour, however). :: # import oopp,sys,commands class WindowsAware(type): def __init__(cls,*args): if sys.platform=='win32': for key,val in vars(cls).iteritems(): if isinstance(val,staticmethod): setattr(cls,key,staticmethod(lambda *args: "Sorry, you are (or I pretend you are) on Windows," " you cannot use the %s.module" % cls.__name__)) sys.platform="win32" #just in case you are not on Windows commands=oopp.ClsFactory[WindowsAware](commands) print commands.getoutput('date') #cannot be executed on Windows # The output of this script is :: Sorry, you are on Windows, you cannot use the commands.module However, if you are on Linux and you comment out the line :: sys.platform="win32" you will see that the script works. Notice that the line ``commands=WindowsAware(commands)`` actually converts the 'commands' module in a 'commands' class, but since the usage is the same, this will fool all programs using the commands module. In this case the class factory 'WindowsAware' can also be thought as a module modifier. In this sense, it is very useful to denote the metaclass with an *adjective*. Metaclasses and attribute descriptors ---------------------------------------------------------------------- Descriptors are especially useful in conjunction with metaclasses, since a custom metaclass can use them as low level tools to modify the methods and the attributes of its instances. This allows to implement very sophisticated features with few lines of code. Notice, anyway, that even plain old function can be thought of as of descriptors. Descriptors share at least two features with metaclasses: 1. as metaclasses, descriptors are best used as adjectives, since they are intended to modify and enhance standard methods and attributes, in the same sense metaclasses modify and enhance standard classes; 2. as metaclasses, descriptors can change the *semantics* of Python, i.e. what you see is not necessarely what you get. As such, they are a dangerous feature. Use them with judgement! Now I will show a possible application of properties. # "Making read-only attributes" class Readonly(type): def __init__(cls,name,bases,dic): super(Readonly,cls).__init__(name,bases,dic) # cooperative __init__ readonly=dic.get('__readonly__',[]) for attr in readonly: get=lambda self,value=getattr(cls,attr): value setattr(cls,attr,property(get)) class C: __metaclass__=Readonly __readonly__='x','y' x=1 y=2 c=C() print c.x,c.y try: c.x=3 except Exception,e: print e # Suppose one has a given class with various kind of attributes (plain methods, regular methods, static methods, class methods, properties and data attributes) and she wants to trace to access to the data attributes (notice that the motivation for the following problem come from a real question asked in comp.lang.python). Then one needs to retrieve data attributes from the class and convert them in properties controlling their access syntax. The first problem is solved by a simple function :: # def isplaindata(a): """A data attribute has no __get__ or __set__ attributes, is not a built-in function, nor a built-in method.""" return not(hasattr(a,'__get__') or hasattr(a,'__set__') or isinstance(a,BuiltinMethodType) or isinstance(a,BuiltinFunctionType)) # whereas the second problem is elegantly solved by a custom metaclass: :: # from oopp import isplaindata,inspect class TracedAccess(type): "Metaclass converting data attributes to properties" def __init__(cls,name,bases,dic): cls.datadic={} for a in dic: if isplaindata(a): cls.datadic[a]=dic[a] def get(self,a=a): v=cls.datadic[a] print "Accessing %s, value=%s" % (a,v) return v def set(self,v,a=a): print "Setting %s, value=%s" % (a,v) cls.datadic[a]=v setattr(cls,a,property(get,set)) class C(object): __metaclass__ = TracedAccess a1='x' class D(C): # shows that the approach works well with inheritance a2='y' i=D() i.a1 # => Accessing a1, value=x i.a2 # => Accessing a2, value=y i.a1='z' # => Setting a1, value=z i.a1 # => Accessing a1, value=z # In this example the metaclass looks at the plain data attributes (recognized thanks ot the ``isplaindata`` function) of its instances and put them in the dictionary ``cls.datadic``. Then the original attributes are replaced with property objects tracing the access to them. The solution is a 4-line custom metaclass doing the boring job for me: :: # class Wrapped(Customizable,type): """A customizable metaclass to wrap methods with a given wrapper and a given condition""" __metaclass__=Reflective wrapper=wrappedmethod condition=lambda k,v: True # wrap all def __init__(cls,*args): super(cls.__this,cls).__init__(*args) wrap(cls,cls.wrapper,cls.condition.im_func) Traced=Wrapped.With(wrapper=tracedmethod,__name__='Traced') Timed=Wrapped.With(wrapper=timedmethod,__name__='Timed') # Here is an example of usage: >>> from oopp import * >>> time_=ClsFactory[Traced](time) >>> print time_.asctime() [time_] Calling 'asctime' with arguments (){} ... -> 'time_.asctime' called with result: Sun May 4 07:30:51 2003 Sun May 4 07:30:51 2003 Another is :: # from oopp import ClsFactory,Traced,Reflective def f1(x): return x # nested functions def f2(x): return f1(x) # we want to trace f1orf2=lambda k,v : v is f1 or v is f2 make=ClsFactory[Reflective,Traced.With(condition=f1orf2)] traced=make('traced',globals()) traced.f2('hello!') # call traced.f2 # with output :: [__main__] Calling 'f2' with arguments ('hello!',){} ... [__main__] Calling 'f1' with arguments ('hello!',){} ... -> '__main__.f1' called with result: hello! -> '__main__.f2' called with result: hello! Modifying hierarchies --------------------------------------------------------- Suppose one wants to enhance a pre-existing class, for instance by adding tracing capabilities to it. The problem is non-trivial since it is not enough to derive a new class from the original class using the 'Traced' metaclass. For instance, we could imagine of tracing the 'Pizza' class introduced in chapter 4 by defining >>> from oopp import * >>> class TracedTomatoPizza(GenericPizza,WithLogger): ... __metaclass__=ClsFactory[Traced] ... toppinglist=['tomato'] However, this would only trace the methods of the newly defined class, not of the original one. Since the new class does not introduce any non-trivial method, the addition of 'Traced' is practically without any effect: >>> marinara=TracedTomatoPizza('small') # nothing happens ***************************************************************************** Tue Apr 15 11:00:17 2003 1. Created small pizza with tomato, cost $ 1.5 Tracing hierarchies ------------------------------------------------------------------------------ :: # from oopp import * def wrapMRO(cls,wrapped): for c in cls.__mro__[:-1]: wrap(c,wrapped) tracing=tracedmethod.With(logfile=file('trace.txt','w')) wrapMRO(HomoSapiensSapiens,tracing) HomoSapiensSapiens().can() # with output in trace.txt :: [HomoSapiensSapiens] Calling 'can' with arguments (,){} ... [HomoSapiens] Calling 'can' with arguments (,){} ... [HomoHabilis] Calling 'can' with arguments (,){} ... [Homo] Calling 'can' with arguments (,){} ... [PrettyPrinted] Calling '__str__' with arguments (,){} ... [PrettyPrinted.__str__] called with result: [Homo.can] called with result: None [HomoHabilis.can] called with result: None [HomoSapiens.can] called with result: None [HomoSapiensSapiens.can] called with result: None Modifying source code ------------------------------------------------------------------------ The real solution would be to derive the original class 'GenericPizza' from 'Traced' and not from 'object'. One could imagine of creating a new class inhering from 'Traced' and with all the methods of the original 'GenericPizza' class; then one should create copies of all the classes in the whole multiple inheritance hierarchy. This would be a little annoying, but feasable; the real problem is that this approach would not work with cooperative methods, since cooperative calls in the derived classes would invoked methods in the original classes, which are not traced. This is a case where the modification of the original source code is much more appealing and simpler that any other method: it is enough to perform a search and replace in the original source code, by adding the metaclass 'Traced', to enhance the whole multiple inheritance hierarchy. Let me assume that the hierarchy is contained in a module (which is typical case). The idea, is to generate *dynamically* a new module from the modified source code, with a suitable name to avoid conflicts with the original module. Incredibily enough, this can be done in few lines: :: # def modulesub(s,r,module): "Requires 2.3" name=module.__name__ source=inspect.getsource(module).replace(s,r) dic={name: module}; exec source in dic # exec the modified module module2=ModuleType(name+'2') # creates an an empty module customize(module2,**dic) # populates it with dic return module2 # Notice that the ``sub`` function, that modifies the source code of a given module and returns a modified module, requires Python 2.3 to work. This is a due to a subtle bug in ``exec`` in Python 2.2. Anyway, the restriction to Python 2.3 allows me to take advantage of one of the most elegant convenience of Python 2.3: the name in the ``types`` module acts are type factories and in particular ``ModuleType(s)`` returns an (empty) module named ``s``. Here is an example of usage: >>> import oopp >>> s='GenericPizza(object):' >>> oopp2=oopp.modulesub(s,s+'\n __metaclass__=oopp.Traced',oopp) Name clashes are avoided, being 'oopp2' a different module from 'oopp'; we have simultaneously access to both the original hierarchy in 'oopp' (non-traced) and the modified one in 'oopp2' (traced). In particular 'oopp2.CustomizablePizza' is traced and therefore >>> class PizzaLog(oopp2.CustomizablePizza,oopp2.WithLogger): ... __metaclass__=makecls() >>> marinara=PizzaLog.With(toppinglist=['tomato'])('small') gives the output :: [PizzaLog] Calling '__init__' with arguments (, 'small'){} ... -> 'PizzaLog.__init__' called with result: None ***************************************************************************** Thu Mar 27 09:18:28 2003 [PizzaLog] Calling '__str__' with arguments (,){} ... [PizzaLog] Calling 'price' with arguments (,){} ... [PizzaLog] Calling 'toppings_price' with arguments (,){} ... -> 'PizzaLog.toppings_price' called with result: 0.5 -> 'PizzaLog.price' called with result: 1.5 -> 'PizzaLog.__str__' called with result: small pizza with tomato, cost $ 1.5 1. Created small pizza with tomato, cost $ 1.5 From that we understand what is happening: - ``PizzaLog.__init__`` calls ``GenericPizza.__init__`` that defines size and cooperatively calls ``WithLogger.__init__`` - WithLogger.__init__ cooperatively calls ``WithCounter.__init__`` that increments the count attribute; - at this point, the instruction 'print self' in ``WithLogger.__init__`` calls ``PizzaLog.__str__`` (inherited from ``GenericPizza.__str__``); - ``GenericPizza.__str__`` calls 'price' that in turns calls 'toppings_price'. On top of that, notice that the metaclass of 'PizzaLog' is ``_TracedReflective`` that has been automagically generated by ``makecls`` from the metaclasses of 'CustomizablePizza' (i.e. 'Traced') and of 'WithLogger' (i.e. 'Reflective'); the leading underscore helps to understand the dynamical origin of '_TracedReflective'. It turns out that '_TracedReflective' has a dynamically generated (meta-meta)class: >>> print type(type(PizzaLog)) #meta-metaclass Therefore this example has a non-trivial class hierarchy >>> print oopp.MRO(PizzaLog) MRO of PizzaLog: 0 - PizzaLog(CustomizablePizza,WithLogger)[Traced] 1 - CustomizablePizza(GenericPizza,Customizable)[Traced] 2 - GenericPizza(object)[Traced] 3 - WithLogger(WithCounter,Customizable,PrettyPrinted) 4 - WithCounter(object) 5 - Customizable(object) 6 - PrettyPrinted(object) 7 - object() a non-trivial metaclass hierarchy, >>> print oopp.MRO(type(PizzaLog)) # the metaclass hierarchy MRO of Traced: 0 - Traced(Reflective)[WithWrappingCapabilities] 1 - Reflective(type) 2 - type(object) 3 - object() and a non-trivial meta-metaclass hierarchy: >>> print oopp.MRO(type(type(PizzaLog))) # the meta-metaclass hierarchy MRO of WithWrappingCapabilities: 0 - WithWrappingCapabilities(BracketCallable) 1 - CallableWithBrackets(type) 2 - type(object) 3 - object() Pretty much complicated, isn't it ? ;) This example is there to show what kind of maintenance one can have with programs doing a large use of metaclasses, particularly, when they should be understood by somebody else than the autor ... Metaclass regenerated hierarchies -------------------------------------------------------------------------- :: import types def hierarchy(self,cls): d=dict([(t.__name__,t) for t in vars(types).itervalues() if isinstance(t,type)]) def new(c): bases=tuple([d[b.__name__] for b in c.__bases__]) return self(c.__name__, bases, c.__dict__.copy()) mro=list(cls.__mro__[:-1]) mro.reverse() for c in mro: if not c.__name__ in d: d[c.__name__]=new(c) customize(self,**d) ClsFactory.hierarchy=hierarchy traced=ClsFactory[Traced,Reflective] Unfortunately, this approach does not work if the original hierarchy makes named cooperative super calls. Therefore the source-code run-time modification has its advantages.