.. -*- mode: rst -*-
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
OBJECT ORIENTED PROGRAMMING IN PYTHON
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:Version: 0.5
:Author: Michele Simionato
:E-mail: mis6@pitt.edu
:Home-page: http://www.phyast.pitt.edu/~micheles/
:Disclaimer: I release this book to the general public.
It can be freely distributed if unchanged.
As usual, I don't give any warranty: while I have tried hard to ensure the
correctness of what follows, I disclaim any responsability in case of
errors . Use it at your own risk and peril !
.. contents::
.. raw:: latex
\setcounter{chapter}{-1}
Preface
============
.. line-block::
*There is only one way to learn: trough examples*
The philosophy of this book
---------------------------
This book is written with the intent to help the programmer going trough
the fascinating concepts of Object Oriented Programming (OOP), in their
Python incarnation. Notice that I say to help, not to teach. Actually,
I do not think that a book can teach OOP or any other non-trivial matter
in Computer Science or other disciplines. Only the
practice can teach: practice, then practice, and practice again.
You must learn yourself from your experiments, not from the books.
Nevertheless, books are useful. They cannot teach, but they can help.
They should give you new ideas that you was not thinking about, they should
show tricks you do not find in the manual, and in general they should be of
some guidance in the uphill road to knowledge. That is the philosophy
of this book. For this reason
1. It is not comprehensive, not systematic;
it is intended to give ideas and basis: from
that the reader is expected to cover the missing part on his own,
browsing the documentation, other sources and other books, and finally
the definite autority, the source itself.
2. It will not even try to teach the *best* practices. I will show what you can
do with Python, not what you "should" do. Often I will show solutions that are
not recommended. I am not a mammy saying this is
good, this is bad, do this do that.
3. You can only learn from your failures. If you think "it should work, if I do
X and Y" and it works, then you have learned nothing new.
You have merely verified
that your previous knowledge was correct, but you haven't create a new
knowledge. On the other hand, when you think "it should work, if I do
X and Y" and it doesn't, then you have learned that your previous knowlegde
was wrong or incomplete, and you are forced to learn something new to
overcome the difficulty. For this reason, I think it is useful to report
not only how to do something, but also to report how not to do something,
showing the pitfalls of wrong approaches.
That's in my opinion is the goal of a good book. I don't know if have
reached this goal or not (the decision is up to the reader), but at least
I have tried to follow these guidelines.
Moreover, this is not a book on OOP,
it is a book on OOP *in Python*.
In other words, the point of view of this book is not
to emphasize general topics of OOP that are exportable to other languages,
but exactly the opposite: I want to emphasize specific techniques that one
can only use in Python, or that are difficult to translate to other
languages. Moreover, I will not provide comparisons with other
languages (except for the section "Why Python?" in this introduction and
in few selected other places),
in order to keep the discussion focused.
This choice comes from the initial motivation for this book, which was
to fulfill a gap in the (otherwise excellent) Python documentation.
The problem is that the available documentation still lacks an accessible
reference of the new Python 2.2+ object-oriented features.
Since myself I have learned Python and OOP from scratch,
I have decided to write this book in order to fill that gap and
help others.
The emphasis in this book is not in giving
solutions to specific problems (even if most of the recipes of this book
can easily be tailored to solve real life concrete problems), it is in
teaching how does it work, why it does work in some cases and why does
not work in some other cases. Avoiding too specific problems has an
additional bonus, since it allows me to use *short* examples (the majority
of the scripts presented here is under 20-30 lines) which I think are
best suited to teach a new matter [#]_ . Notice, however, that whereas
the majority of the scripts in this book are short, it is also true
that they are pretty *dense*. The density is due to various reasons:
1. I am defining a lot of helper functions and classes, that are
reused and enhanced during all the book.
2. I am doing a strong use of inheritance, therefore a script at the
end of the book can inherits from the classes defined through all
the book;
3. A ten line script involving metaclasses can easily perform the equivalent
of generating hundreds of lines of code in a language without metaclasses
such as Java or C++.
To my knowledge, there are no other books covering the same topics with
the same focus (be warned, however, that I haven't read so many Python
books ;-). The two references that come closest to the present book are
the ``Python Cookbook`` by Alex Martelli and David Ascher, and
Alex Martelli's ``Python in a Nutshell``. They are quite recent books and
therefore it covers (in much less detail) some of the 2.2 features that are
the central topics to this book.
However, the Cookbook reserves to OOP only one chapter and has a quite
different philosophy from the present book, therefore there is
practically no overlapping. Also ``Python in a Nutshell`` covers
metaclasses in few pages, whereas half of this book is essentially
dedied to them. This means that you can read both ;-)
.. [#] Readers that prefer the opposite philosophy of using longer,
real life-like, examples, have already the excellent "Dive into
Python" book http://diveintopython.org/ at their disposal. This is
a very good book that I certainly recommend to any (experienced)
Python programmer; it is also freely available (just like this ;-).
However, the choice of arguments is quite different and there is
essentially no overlap between my book and "Dive into Python"
(therefore you can read both ;-).
For who this book in intended
-----------------------------
I have tried to make this tutorial useful to a large public of Pythonistas,
i.e. both people with no previous experience of Object Oriented Programming
and people with experience on OOP, but unfamiliar with the most
recent Python 2.2-2.3 features (such as attribute descriptors,
metaclasses, change of the MRO in multiple inheritance, etc).
However, this is not a book for beginners: the non-experienced reader should
check (at least) the Internet sites www.python.org/newbies.com and
www.awaretek.com, that provide a nice collection of resources for Python
newbies.
These are my recommendations for the reader, according to her/his level:
1. If you are an absolute beginner, with no experience on programming,
this book is *not* for you (yet ;-). Go to
http://www.python.org/doc/Newbies.html and read one of the introductive
texts listed there, then come back here. I recommend "How to Think Like
a Computer Scientist", available for free on the net (see
http://www.ibiblio.org/obp/thinkCSpy/); I found it useful myself when
I started learning Python; be warned, however, that it refers to the rather
old Python version 1.5.2. There are also excellent books
on the market (see http://www.awaretek.com/plf.html).
http://www.uselesspython.com/ is a good resource to find recensions
about available Python books. For free books, look at
http://www.tcfb.com/freetechbooks/bookphyton.html .
This is *not* another Python tutorial.
2. If you know already (at least) another programming language, but you don't
know Python, then this book is *not* for you (again ;-). Read the FAQ, the
Python Tutorial and play a little with the Standard Library (all this
material can be downloaded for free from http://www.python.org), then
come back here.
3. If you have passed steps 1 and 2, and you are confortable with Python
at the level of simple procedural programming, but have no clue about
objects and classes, *then* this book is for you. Read this book till
the end and your knowledge of OOP will pass from zero to a quite advanced
level (hopefully). Of course, you will have to play with the code in
this book and write a lot of code on your own, first ;-)
4. If you are confortable with Python and you also known OOP from other
languages or from earlier version of Python, then this book is for
you, too: you are ready to read the more advanced chapters.
5. If you are a Python guru, then you should read the book, too. I expect
you will find the errors and send me feedback, helping me to improve
this tutorial.
About the scripts in this book
-----------------------------------------------------------------------------
All the scripts in this book are free. You are expected to play
with them, to modify them and to improve them.
In order to facilitate the extraction of the scripts from the main text, both
visually for the reader and automatically for Python, I use the
convention of sandwiching the body of the example scripts in blocks like this
::
#
print "Here Starts the Python Way to Object Oriented Programming !"
#
You may extract the source of this script with the a Python program
called "test.py" and provided in the distribution. Simply give the
following command:
::
$ python test.py myfirstscript.py
This will create a file called "myfirstscript.py", containing the
source of ``myfirstscript.py``; moreover it will execute the script
and write its output in a file called "output.txt". I have tested
all the scripts in this tutorial under Red Hat Linux 7.x and
Windows 98SE. You should not have any problem in running them,
but if a problem is there, "test.py" will probably discover it,
even if, unfortunately, it will not provide the solution :-(.
Notice that test.py requires Python 2.3+ to work, since most of
the examples in this book heavily depends on the new features
introduced in Python 2.2-2.3. Since the installation of Python
2.3 is simple, quick and free, I think I am requiring to my readers
who haven't upgraded yet a very little effort. This is well worth
the pain since Python 2.3 fixes few bugs of 2.2 (notably in the subject of
attribute descriptors and the ``super`` built-in) that makes
You may give more arguments to test.py, as in this example:
::
$ python test.py myfirstscript.py mysecondscript.py
The output of both scripts will still be placed in the file "output.txt".
Notice that if you give an argument which is not the name of a script in the
book, it will be simply ignored. Morever, if you will not give any argument,
"test.py" will automatically executes all the tutorial scripts, writing their
output in "output.txt" [#]_ . You may want to give a look at this file, once
you have finished the tutorial. It also contains the source code of
the scripts, for better readability.
Many examples of this tutorial depend on utility functions defined
in a external module called ``oopp`` (``oopp`` is an obvious abbreviation
for the title of the tutorial). The module ``oopp`` is automatically generated
by "test.py", which works by extracting from the tutorial
text blocks of code of the form ``# something #``
and saving them in a file called "oopp.py".
Let me give an example. A very recent enhancement to Python (in
Python 2.3) has been the addition of a built-in boolean type with
values True and False:
::
$ python
Python 2.3a1 (#1, Jan 6 2003, 10:31:14)
[GCC 2.96 20000731 (Red Hat Linux 7.2 2.96-108.7.2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 1+1==2
True
>>> 1+1==3
False
>>> type(True)
>>> type(False)
However, previous version of Python use the integers 1 and 0 for
True and False respectively.
::
$ python
Python 2.2 (#1, Apr 12 2002, 15:29:57)
[GCC 2.96 20000731 (Red Hat Linux 7.2 2.96-109)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 1+1==2
1
>>> 1+1==3
0
Following the 2.3 convension, in this tutorial I will use the names
``True`` and ``False`` to denotes the numbers 1 and 0 respectively.
This is automatic in Python 2.2.1+, but not in Python 2.2. Therefore,
for sake of compatibility, it is convenient to set the values ``True``
and ``False`` in our utility module:
::
#
import __builtin__
try:
__builtin__.True #look if True is already defined
except AttributeError: # if not add True and False to the builtins
__builtin__.True = 1
__builtin__.False = 0
#
Here there is an example of usage:
::
#
import oopp
print "True =",True,
print "False =",False
#
The output is "True = 1 False = 0" under Python 2.2 and
"True = True False = False" under Python 2.3+.
.. [#] "test.py", invoked without arguments, does not create '.py' files,
since I don't want to kludge the distribution with dozens of ten-line
scripts. I expect you may want to save only few scripts as standalone
programs, and cut and paste the others.
Conventions used in this book
----------------------------------------------------------------------
Python expressions are denoted with monospaced fonts when in the text.
Sections marked with an asterisk can be skipped in a first reading.
Typically they have the purpose of clarifying some subtle point and
are not needed for the rest of the book. These sections are intended
for the advanced reader, but could confuse the beginner.
An example is the section about the difference between methods and
functions, or the difference between the inheritance constraint and
the metaclass constraint.
Introduction
===========================================================================
.. line-block::
*A language that doesn't affect the way you think about programming,
is not worth knowing.* -- Alan Perlis
Why OOP ?
----------------------------
I guess some of my readers, like me, have started programming in the mid-80's,
when traditional (i.e. non object-oriented) Basic and Pascal where popular as
first languages. At the time OOP was not as pervasive in software development
how it is now, most of the mainstream languages were non-object-oriented and
C++ was just being released. That was a time when the transition from
spaghetti-code to structured code was already well accomplished, but
the transition from structured programming to (the first phase of)
OOP was at the beginning.
Nowaydays, we live in a similar time of transition . Today, the transition
to (the first phase of) OOP is well accomplished and essentially all
mainstream
languages support some elementary form of OOP. To be clear, when I say
mainstream langauges, I have in mind Java and C++: C is a remarkable
exception to the rule, since it is mainstream but not object-oriented.
However, both Java an C++ (I mean standard Java and C++, not special
extension like DTS C++, that have quite powerful object oriented features)
are quite poor object-oriented languages: they provides only the most
elementary aspects of OOP, the features of the *first phase* of OOP.
Hence, today the transition to the *second phase* of OOP is only at the
beginning, i.e mainstream language are not yet really OO, but they will
become OOP in the near future.
By second phase of OOP I mean the phase in which the primary
objects of concern for the programmer are no more the objects, but the
metaobjects. In elementary OOP one works on objects, which have attributes
and methods (the evolution of old-fashioned data and functions) defined
by their classes; in the second phase of OOP one works on classes
which behavior is described by metaclasses. We no more modify objects
trough classes: nowadays we modify classes and class hierarchies
through metaclasses and multiple inheritance.
It would be tempting to represent the history of programming in the last
quarter of century with an evolutionary table like that:
======================== ==================== ====================== =======
~1975 ~1985 ~1995 ~2005
======================== ==================== ====================== =======
procedural programming OOP1 OOP2 ?
data,functions objects,classes classes,metaclasses ?
======================== ==================== ====================== =======
The problem is that table would be simply wrong, since in truth
Smalltalk had metaclasses already 25 years ago! And also Lisp
had *in nuce* everything a long *long* time ago.
The truth is that certains languages where too much ahead of their
time ;-)
Therefore, today we already have all the ideas
and the conceptual tools to go beyond the first phase of OOP
(they where invented 20-30 years ago), nevertheless those ideas are
not yet universally known, nor implemented in mainstream languages.
Fortunately, there are good languages
where you can access the bonus of the second phase of OOP (Smalltalk, CLOS,
Dylan, ...): unfortunately
most of them are academic and/or little known in the real world
(often for purely commercial reasons, since typically languages are not
chosen accordingly to their merits, helas!). Python is an exception to this
rule, in the sense that it is an eminently practical language (it started
as a scripting language to do Operating System administrative jobs),
which is relatively known and used in that application niche (even if some
people *wrongly* think that should not be used for 'serious' things).
There are various reasons why most mainstream languages are rather
poor languages, i.e. underfeatured languages (as Java) or powerful, but too
tricky to use, as C++. Some are good reasons (for instance *efficiency*: if
efficiency is the first concern, then poor languages can be much
better suited to the goal: for instance Fortran for number crunching
and C for system programming), some are less good (economical
monopoly). There is nothing to do against these reasons: if you
need efficiency, or if you are forced to use a proprietary language
because it is the language used by your employer. However, if you
are free from these restrictions, there is another reason why you
could not choose to use a poweful language. The reason is that,
till now, programmers working in the industrial world mostly had simple
problems (I mean conceptually simple problems). In order to solve
simple problems one does not need a powerful language, and the effort
spent in learning it is not worth.
However, nowadays the situations has changed. Now, with Internet and graphics
programming everywhere, and object-oriented languages so widespread,
now it is the time when actually people *needs* metaprogramming, the
ability to changing classes and programs. Now everybody is programming
in the large.
In this situation, it is justified to spend some time to learn better
way of programming. And of course, it is convenient to start from
the language with the flattest learning curve of all.
Why Python ?
-----------------------------------------------------------------------
.. line-block::
*In many ways, it's a dull language, borrowing solid old concepts from
many other languages & styles: boring syntax, unsurprising semantics,
few automatic coercions, etc etc. But that's one of the things I like
about it.* --Tim Peters on Python, 16 Sep 93
If you are reading this book, I assume you already have some experience
with Python. If this is the case, you already know the obvious advantages
of Python such as readability, easy of use and short development time.
Nevertheless, you could only have used Python as a fast and simple
scripting language. If you are in this situation, then your risk to
have an incorrect opinion on the language like "it is a nice little
language, but too simple to be useful in 'real' applications". The
truth is that Python is designed to be *simple*, and actually it
is; but by no means it is a "shallow" language. Actually, it goes
quite *deep*, but it takes some time to appreciate this fact.
Let me contrast Python with Lisp, for instance. From the beginning,
Lisp was intended to be a language for experts, for people with difficult
problems to solve. The first
users of Lisp were academicians, professors of CS and scientists.
On the contrary, from the beginning Python
was intended to be language for everybody (Python predecessor was ABC,
a language invented to teach CS to children). Python makes great a first
language for everybody, whereas Lisp would require especially
clever and motivated students (and we all know that there is lack
of them ;-)
From this difference of origins, Python inherits an easy to learn syntax,
whereas Lisp syntax is horrible for the beginner (even if not as
horrible as C++ syntax ;-)
.. line-block::
*Macros are a powerful extension to weak languages.
Powerful languages don't need macros by definition.*
-- Christian Tismer on c.l.p. (referring to C)
Despite the differences, Python borrows quite a lot from Lisp and it
is nearly as expressive as it (I say nearly since Python is
not as powerful as Lisp: by tradition, Lisp has always been on the top of
hierarchy of programming language with respect to power of abstraction).
It is true that Python lacks some powerful Lisp features: for instance
Python object model lacks multiple dispatching (for the time being ;-)
and the language lacks Lisp macros (but this unlikely to change in the
near future since Pythonistas see the lack of macro as a Good Thing [#]_):
nevertheless, the point is that Python is much *much* easier to learn.
You have (nearly) all the power, but without the complexity.
One of the reasons, is that Python
try to be as *less* innovative as
possible: it takes the proven good things from others, more innovative
languages, and avoids their pitfalls. If you are an experienced
programmer , it will be even easier to you to learn Python, since
there is more or less nothing which is really original to Python.
For instance:
1. the object model is took from languages that are good at it, such
as Smalltalk;
2. multiple inheritance has been modeled from languages good in it. such
as CLOS and Dylan;
3. regular expression follows the road opened by Perl;
4. functional features are borrowed from functional languages;
5. the idea of documentation strings come from Lisp;
6. list comprehension come from Haskell;
7. iterators and generators come from Icon;
8. etc. etc. (many other points here)
I thinks the really distinctive feature of Python with respect to
any other serious language I know, is that Python is *easy*. You have the
power (I mean power in conceptual sense, not computational power: in
the sense of computational power the best languages are
non-object-oriented ones)
of the most powerful languages with a very little investement.
In addition to that, Python has a relatively large user base
(as compared to Smalltalk or Ruby, or the various fragmented Lisp
communities). Of course,
there is quite a difference between the user base of Python with
respect to the user base of, let say, VisualBasic or Perl. But
I would never take in consideration VisualBasic for anything serious,
whereas Perl is too ugly for my taste ;-).
Finally, Python is *practical*. With this I mean the fact that
Python has libraries that
allow the user to do nearly everything, since you can access all the C/C++
libraries with little or no effort, and all the Java libraries, though the
Python implementation known as Jython. In particular, one has the choice
between many excellent GUI's trough PyQt, wxPython, Tkinter, etc.
Python started as an Object Oriented Programming
Languages from the beginning, nevertheless is was never intended to be
a *pure* OOPL as SmallTalk or, more recently, Ruby. Python is a
*multiparadigm*
language such a Lisp, that you choose your programming style according
to your problem: spaghetti-code, structured programming, functional
programming, object-oriented programming are all supported. You can
even write bad code in Python, even if it is less simple than in other
languages ;-). Python is a language which has quite evolved in its twelve
years of life (the first public release was released in February 1991)
and many new features have been integrated in the language with time.
In particular, Python 2.2 (released in 2002) was a major breakthrough
in the history of the language
for what concerns support to Object Oriented Programming (OOP).
Before the 2.2 revolution, Python Object
Orientation was good; now it is *excellent*. All the fundamental features
of OOP, including pretty sophisticated ones, as metaclasses and multiple
inheritance, have now a very good support (the only missing thing is
multiple dispatching).
.. [#]
Python lacks macros for an intentional design choice: many people
in the community (including Guido itself) feel that macros are
"too powerful". If you give the user the freedom to create her
own language, you must face at least three problems: i) the risk
to split the original language in dozens of different dialects;
ii) in collaborative projects, the individual programmer must
spend an huge amount of time and effort would be spent in learning
macro systems written by others; iii) not all users are good
language designers: the programmer will have to fight with badly
designed macro systems. Due to these problems, it seems unlikely
that macros will be added to Python in the future.
.. [#]
For a good comparison between Python and Lisp I remind the reader to
the excellent Peter Norvig's article in
http://www.norvig.com/python-lisp.html
Further thoughts
---------------------------------------------------------------------------
Actually, the principal reasons why I begun studying
Python was the documentation and the newsgroup: Python has an outstanding
freely available documentation and an incredibly helpful newsgroup that
make extremely easy to learn the language. If I had found a comparable
free documentation/newsgroup for C++ or Lisp, I would have studied that
languages instead.
Unfortunately, the enormous development at the software level, had no
correspondence with with an appropriate development of documentation.
As a consequence, the many beatiful, powerful and extremely *useful*
new features of Python 2.2+ object orientation are mostly remained
confined to developers and power users: the average Python programmer
has remained a little a part from the rapid development and she
*wrongly* thinks she has no use for the new features. There have
also been *protestations* of the users against developers of the
kind "please, stop adding thousands of complicated new extensions
to the language for which we have no use" !
Extending a language is always a delicate thing to do, for a whole
bunch of reasons:
1. once one extension is done, it is there *forever*.
My experience has been the following.
When I first read about metaclasses, in Guido's essay
"Unifying types and classes in Python 2.2", I thought "Wow,
classes of classes, cool concept, but how useful is it?
Are metaclasses really providing some new functionality?
What can I do with metaclasses that I cannot do without?"
Clearly, in these terms, the question is rather retorical, since in principle
any Turing-complete programming languages contains all the features provided
by metaclasses. Python metaclasses themselves are implemented in C, that has
no metaclasses. Therefore, my real question was not "What can I do
with metaclasses that I cannot do without?" but "How big is the convenience
provided by metaclasses, with respect to my typical applications?".
The answer depends on the kind of problem you are considering. For certain
classes of problems it can be *very* large, as I will show in this and in
the next chapters.
I think the biggest advantage of metaclasses is *elegance*. Altough it
is true that most of what you can do with metaclasses, can be done without
metaclasses, not using metaclasses can result in a much *uglier* solution.
One needs difficult problems in order to appreciate the advantage
of powerful methods.
If all you need is to write few scripts for copying two or three files,
there is no point in learning OOP.On the other hand, if you only
write simple programs where you define only one of two classes, there
is no point in using metaclasses. Metaclasses becomes relevant only
when you have many classes, whole classes of classes with similar
features that you want to modify.
In this sense, metaprogramming is for experts only, i.e. with people
with difficult problems. The point however, is that nowaydays,
many persons have difficult problems.
Finally, let me conclude this preface by recalling the
gist of Python wisdom.
>>> import this
The Zen of Python, by Tim Peters
.
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
FIRST THINGS, FIRST
==============================================================================
This is an introductory chapter, with the main purpose of fixing the
terminology used in the sequel. In particular, I give the definitions
of objects, classes, attributes and methods. I discuss a few examples
and I show some of the most elementary Python introspection features.
What's an object?
----------------------------------------------------------------------------
.. line-block::
*So Everything Is An object.
I'm sure the Smalltalkers are very happy :)*
-- Michael Hudson on comp.lang.python
"What's an object" is the obvious question raised by anybody starting
to learn Object Oriented Programming. The answer is simple: in Python,
everything in an object!
An operative definition is the following: an *object*
is everything that can be labelled with an *object reference*.
In practical terms, the object reference is implemented as
the object memory address, that is an integer number which uniquely
specify the object. There is a simple way to retrieve the object reference:
to use the builtin ``id`` function. Informations on ``id`` can be retrieved
via the ``help`` function [#]_:
>>> help(id)
Help on built-in function id:
id(...)
id(object) -> integer
Return the identity of an object. This is guaranteed to be unique among
simultaneously existing objects. (Hint: it's the object's memory address.)
The reader is strongly encouraged to try the help function on everything
(including help(help) ;-). This is the best way to learn how Python works,
even *better* than reading the standard documentation, since the on-line
help is often more update.
Suppose for instance we wonder if the number ``1`` is an object:
it is easy enough to ask Python for the answer:
>>> id(1)
135383880
Therefore the number 1 is a Python object and it is stored at the memory
address 135383880, at least in my computer and during the current session.
Notice that the object reference is a dynamic thing; nevertheless it
is guaranteed to be unique and constant for a given object during its
lifetime (two objects whose lifetimes are disjunct may have the same id()
value, though).
Here there are other examples of built-in objects:
>>> id(1L) # long
1074483312
>>> id(1.0) #float
135682468
>>> id(1j) # complex
135623440
>>> id('1') #string
1074398272
>>> id([1]) #list
1074376588
>>> id((1,)) #tuple
1074348844
>>> id({1:1}) # dict
1074338100
Even functions are objects:
>>> def f(x): return x #user-defined function
>>> id(f)
1074292020
>>> g=lambda x: x #another way to define functions
>>> id(g)
1074292468
>>> id(id) #id itself is a built-in function
1074278668
Modules are objects, too:
>>> import math
>>> id(math) #module of the standard library
1074239068
>>> id(math.sqrt) #function of the standard library
1074469420
``help`` itself is an object:
>>> id(help)
1074373452
Finally, we may notice that the reserved keywords are not objects:
>>> id(print) #error
File "", line 1
id(print) ^
SyntaxError: invalid syntax
The operative definition is convenient since it gives a practical way
to check if something is an object and, more importantly, if two
objects are the same or not:
.. doctest
>>> s1='spam'
>>> s2='spam'
>>> s1==s2
True
>>> id(s1)==id(s2)
True
A more elegant way of spelling ``id(obj1)==id(obj2)`` is to use the
keyword ``is``:
>>> s1 is s2
True
However, I should warn the reader that sometimes ``is`` can be surprising:
>>> id([]) == id([])
True
>>> [] is []
False
This is happening because writing ``id([])`` dynamically creates an unique
object (a list) which goes away when you're finished with it. So when an
expression needs both at the same time (``[] is []``), two unique objects
are created, but when an expression doesn't need both at the same time
(``id([]) == id([])``), an object gets created with an ID, is destroyed,
and then a second object is created with the same ID (since the last one
just got reclaimed) and their IDs compare equal. In other words, "the
ID is guaranteed to be unique *only* among simultaneously existing objects".
Another surprise is the following:
>>> a=1
>>> b=1
>>> a is b
True
>>> a=556
>>> b=556
>>> a is b
False
The reason is that integers between 0 and 99 are pre-instantiated by the
interpreter, whereas larger integers are recreated each time.
Notice the difference between '==' and 'is':
>>> 1L==1
True
but
>>> 1L is 1
False
since they are different objects:
>>> id(1L) # long 1
135625536
>>> id(1) # int 1
135286080
The disadvantage of the operative definition is that it gives little
understanding of what an object can be used for. To this aim, I must
introduce the concept of *class*.
.. [#] Actually ``help`` is not a function but a callable object. The
difference will be discussed in a following chapter.
Objects and classes
---------------------------------------------------------------------------
It is convenient to think of an object as an element of a set.
It you think a bit, this is the most general definition that actually
grasps what we mean by object in the common language.
For instance, consider this book, "Object Oriented Programming in Python":
this book is an object, in the sense that it is a specific representative
of the *class* of all possible books.
According to this definition, objects are strictly related to classes, and
actually we say that objects are *instances* of classes.
Classes are nested: for
instance this book belongs to the class of books about programming
language, which is a subset of the class of all possible books;
moreover we may further specify this book as a Python book; moreover
we may specify this book as a Python 2.2+ book. There is no limit
to the restrictions we may impose to our classes.
On the other hand. it is convenient to have a "mother" class,
such that any object belongs to it. All strongly Object Oriented
Language have such a class [#]_; in Python it is called *object*.
The relation between objects and classes in Python can be investigated
trough the built-in function ``type`` [#]_ that gives the class of any
Python object.
Let me give some example:
1. Integers numbers are instances of the class ``int`` or ``long``:
>>> type(1)
>>> type(1L)
2. Floating point numbers are instances of the class ``float``:
>>> type(1.0)
3. Complex numbers are instances of the class ``complex``:
>>> type(1.0+1.0j)
4. Strings are instances of the class ``str``:
>>> type('1')
5. List, tuples and dictionaries are instances of ``list``, ``tuple`` and
``dict`` respectively:
>>> type('1')
>>> type([1])
>>> type((1,))
>>> type({1:1})
6. User defined functions are instances of the ``function`` built-in type
>>> type(f)
>>> type(g)
All the previous types are subclasses of object:
>>> for cl in int,long,float,str,list,tuple,dict: issubclass(cl,object)
True
True
True
True
True
True
True
However, Python is not a 100% pure Object
Oriented Programming language and its object model has still some minor
warts, due to historical accidents.
Paraphrasing George Orwell, we may say that in Python 2.2-2.3,
all objects are equal, but some objects are more equal than others.
Actually, we may distinguish Python objects in new style objects,
or rich man objects, and old style objects, or poor man objects.
New style objects are instances of new style classes whereas old
style objects are instances of old style classes.
The difference is that new style classes are subclasses of object whereas
old style classes are not.
Old style classes are there for sake of compatibility with previous
releases of Python, but starting from Python 2.2 practically all built-in
classes are new style classes.
Instance of old style classes are called old style objects. I will give
few examples of old style objects in the future.
In this tutorial with the term
object *tout court* we will mean new style objects, unless the contrary
is explicitely stated.
.. [#] one may notice that C++ does not have such a class, but C++
is *not* a strongly object oriented language ;-)
.. [#] Actually ``type`` is not a function, but a metaclass; nevertheless,
since this is an advanced concept, discussed in the fourth chapter;
for the time being it is better to think of ``type`` as a built-in
function analogous to ``id``.
Objects have attributes
----------------------------------------------------------------------------
All objects have attributes describing their characteristics, that may
be accessed via the dot notation
::
objectname.objectattribute
The dot notation is common to most Object Oriented programming languages,
therefore the reader with a little of experience should find it not surprising
at all (Python strongly believes in the Principle of Least Surprise). However,
Python objects also have special attributes denoted by the double-double
underscore notation
::
objectname.__specialattribute__
with the aim of helping the wonderful Python introspection features, that
does not have correspondence in all OOP language.
Consider for example the string literal "spam". We may discover its
class by looking at its special attribute *__class__*:
>>> 'spam'.__class__
Using the ``__class__`` attribute is not always equivalent to using the
``type`` function, but it works for all built-in types. Consider for instance
the number *1*: we may extract its class as follows:
>>> (1).__class__
Notice that the parenthesis are needed to avoid confusion between the integer
1 and the float (1.).
The non-equivalence type/class is the key to distinguish new style objects from
old style, since for old style objects ``type(obj)<>obj.__class__``.
We may use this knowledge to make and utility function that discovers
if an object is a "real" object (i.e. new style) or a poor man object:
::
#
def isnewstyle(obj):
try: #some objects may lack a __class__ attribute
obj.__class__
except AttributeError:
return False
else: #look if there is unification type/class
return type(obj) is obj.__class__
#
Let us check this with various examples:
>>> from oopp import isnewstyle
>>> isnewstyle(1)
True
>>> isnewstyle(lambda x:x)
True
>>> isnewstyle(id)
True
>>> isnewstyle(type)
True
>>> isnewstyle(isnewstyle)
True
>>> import math
>>> isnewstyle(math)
True
>>> isnewstyle(math.sqrt)
True
>>> isnewstyle('hello')
True
It is not obvious to find something which is not a real object,
between the built-in objects, however it is possible. For instance,
the ``help`` "function" is an old style object:
>>> isnewstyle(help)
False
since
>>> help.__class__
is different from
>>> type(help)
Regular expression objects are even poorer objects with no ``__class__``
attribute:
>>> import re
>>> reobj=re.compile('somestring')
>>> isnewstyle(reobj)
False
>>> type(reobj)
>>> reobj.__class__ #error
Traceback (most recent call last):
File "", line 1, in ?
AttributeError: __class__
There other special attributes other than ``__class__``; a particularly useful
one is ``__doc__``, that contains informations on the class it
refers to. Consider for instance the ``str`` class: by looking at its
``__doc__`` attribute we can get information on the usage of this class:
>>> str.__doc__
str(object) -> string
Return a nice string representation of the object.
If the argument is a string, the return value is the same object.
From that docstring we learn how to convert generic objects in strings;
for instance we may convert numbers, lists, tuples and dictionaries:
>>> str(1)
'1'
>>> str([1])
'[1]'
>>> str((1,))
(1,)'
>>> str({1:1})
'{1: 1}'
``str`` is implicitely called each time we use the ``print`` statement, since
``print obj`` is actually syntactic sugar for ``print str(obj)``.
Classes and modules have another interesting special attribute, the
``__dict__`` attribute that gives the content of the class/module.
For instance, the contents of the standard ``math`` module can be retrieved
as follows:
>>> import math
>>> for key in math.__dict__: print key,
...
fmod atan pow __file__ cosh ldexp hypot sinh __name__ tan ceil asin cos
e log fabs floor tanh sqrt __doc__ frexp atan2 modf exp acos pi log10 sin
Alternatively, one can use the built-in function ``vars``:
>>> vars(math) is math.__dict__
True
This identity is true for any object with a ``__dict__`` attribute.
Two others interesting special attributes are ``__doc__``
>>> print math.__doc__
This module is always available. It provides access to the
mathematical functions defined by the C standard.
and ``__file__``:
>>> math.__file__ #gives the file associated with the module
'/usr/lib/python2.2/lib-dynload/mathmodule.so'
Objects have methods
----------------------------------------------------------------------------
In addition to attributes, objects also have *methods*, i.e.
functions attached to their classes [#]_.
Methods are also invoked with the dot notation, but
they can be distinguished by attributes because they are typically
called with parenthesis (this is a little simplistic, but it is enough for
an introductory chapter). As a simple example, let me show the
invocation of the ``split`` method for a string object:
>>> s='hello world!'
>>> s.split()
['hello', 'world!']
In this example ``s.split`` is called a *bount method*, since it is
applied to the string object ``s``:
>>> s.split
An *unbound method*, instead, is applied to the class: in this case the
unbound version of ``split`` is applied to the ``str`` class:
>>> str.split
A bound method is obtained from its corresponding unbound
method by providing the object to the unbound method: for instance
by providing ``s`` to ``str.split`` we obtain the same effect of `s.split()`:
>>> str.split(s)
['hello', 'world!']
This operation is called *binding* in the Python literature: when write
``str.split(s)`` we bind the unbound method ``str.split`` to the object ``s``.
It is interesting to recognize that the bound and unbound methods are
*different* objects:
>>> id(str.split) # unbound method reference
135414364
>>> id(s.split) # this is a different object!
135611408
The unbound method (and therefore the bound method) has a ``__doc__``
attribute explaining how it works:
>>> print str.split.__doc__
S.split([sep [,maxsplit]]) -> list of strings
Return a list of the words in the string S, using sep as the
delimiter string. If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or is None, any
whitespace string is a separator.
.. [#] A precise definition will be given in chapter 5 that introduces the
concept of attribute descriptors. There are subtle
differences between functions and methods.
Summing objects
--------------------------------------------------------------------------
In a pure object-oriented world, there are no functions and everything is
done trough methods. Python is not a pure OOP language, however quite a
lot is done trough methods. For instance, it is quite interesting to analyze
what happens when an apparently trivial statement such as
>>> 1+1
2
is executed in an object-oriented world.
The key to understand, is to notice that the number 1 is an object, specifically
an instance of class ``int``: this means that that 1 inherits all the methods
of the ``int`` class. In particular it inherits a special method called
``__add__``: this means 1+1 is actually syntactic sugar for
>>> (1).__add__(1)
2
which in turns is syntactic sugar for
>>> int.__add__(1,1)
2
The same is true for subtraction, multiplication, division and other
binary operations.
>>> 'hello'*2
'hellohello'
>>> (2).__mul__('hello')
'hellohello'
>>> str.__mul__('hello',2)
'hellohello'
However, notice that
>>> str.__mul__(2,'hello') #error
Traceback (most recent call last):
File "", line 1, in ?
TypeError: descriptor '__mul__' requires a 'str' object but received a 'int'
The fact that operators are implemented as methods, is the key to
*operator overloading*: in Python (as well as in other OOP languages)
the user can redefine the operators. This is already done by default
for some operators: for instance the operator ``+`` is overloaded
and works both for integers, floats, complex numbers and for strings.
Inspecting objects
---------------------------------------------------------------------------
In Python it is possible to retrieve most of the attributes and methods
of an object by using the built-in function ``dir()``
(try ``help(dir)`` for more information).
Let me consider the simplest case of a generic object:
>>> obj=object()
>>> dir(obj)
['__class__', '__delattr__', '__doc__', '__getattribute__',
'__hash__', '__init__', '__new__', '__reduce__', '__repr__',
'__setattr__', '__str__']
As we see, there are plenty of attributes available
even to a do nothing object; many of them are special attributes
providing introspection capabilities which are not
common to all programming languages. We have already discussed the
meaning of some of the more obvious special attributes.
The meaning of some of the others is quite non-obvious, however.
The docstring is invaluable in providing some clue.
Notice that there are special *hidden* attributes that cannot be retrieved
with ``dir()``. For instance the ``__name__`` attribute, returning the
name of the object (defined for classes, modules and functions)
and the ``__subclasses__`` method, defined for classes and returning the
list of immediate subclasses of a class:
>>> str.__name__
'str'
>>> str.__subclasses__.__doc__
'__subclasses__() -> list of immediate subclasses'
>>> str.__subclasses__() # no subclasses of 'str' are currently defined
[]
For instance by doing
>>> obj.__getattribute__.__doc__
"x.__getattribute__('name') <==> x.name"
we discover that the expression ``x.name`` is syntactic sugar for
``x.__getattribute__('name')``
Another equivalent form which is more often used is
``getattr(x,'name')``
We may use this trick to make a function that retrieves all the
attributes of an object except the special ones:
::
#
def special(name): return name.startswith('__') and name.endswith('__')
def attributes(obj,condition=lambda n,v: not special(n)):
"""Returns a dictionary containing the accessible attributes of
an object. By default, returns the non-special attributes only."""
dic={}
for attr in dir(obj):
try: v=getattr(obj,attr)
except: continue #attr is not accessible
if condition(attr,v): dic[attr]=v
return dic
getall = lambda n,v: True
#
Notice that certain attributes may be unaccessible (we will see how
to make attributes unaccessible in a following chapter)
and in this case they are simply ignored.
For instance you may retrieve the regular (i.e. non special)
attributes of the built-in functions:
>>> from oopp import attributes
>>> attributes(f).keys()
['func_closure', 'func_dict', 'func_defaults', 'func_name',
'func_code', 'func_doc', 'func_globals']
In the same vein of the ``getattr`` function, there is a built-in
``setattr`` function (that actually calls the ``__setattr__`` built-in
method), that allows the user to change the attributes and methods of
and object. Informations on ``setattr`` can be retrieved from the help
function:
::
>>> help(setattr)
Help on built-in function setattr:
setattr(...)
setattr(object, name, value)
Set a named attribute on an object; setattr(x, 'y', v) is equivalent to
``x.y = v''.
``setattr`` can be used to add attributes to an object:
::
#
import sys
def customize(obj,errfile=None,**kw):
"""Adds attributes to an object, if possible. If not, writes an error
message on 'errfile'. If errfile is None, skips the exception."""
for k in kw:
try:
setattr(obj,k,kw[k])
except: # setting error
if errfile:
print >> errfile,"Error: %s cannot be set" % k
#
The attributes of built-in objects cannot be set, however:
>>> from oopp import customize,sys
>>> customize(object(),errfile=sys.stdout,newattr='hello!') #error
AttributeError: newattr cannot be set
On the other hand, the attributes of modules can be set:
>>> import time
>>> customize(time,newattr='hello!')
>>> time.newattr
'hello!'
Notice that this means we may enhances modules at run-time, but adding
new routines, not only new data attributes.
The ``attributes`` and ``customize`` functions work for any kind of objects;
in particular, since classes are a special kind of objects, they work
for classes, too. Here are the attributes of the ``str``, ``list`` and
``dict`` built-in types:
>>> from oopp import attributes
>>> attributes(str).keys()
['startswith', 'rjust', 'lstrip', 'swapcase', 'replace','encode',
'endswith', 'splitlines', 'rfind', 'strip', 'isdigit', 'ljust',
'capitalize', 'find', 'count', 'index', 'lower', 'translate','join',
'center', 'isalnum','title', 'rindex', 'expandtabs', 'isspace',
'decode', 'isalpha', 'split', 'rstrip', 'islower', 'isupper',
'istitle', 'upper']
>>> attributes(list).keys()
['append', 'count', 'extend', 'index', 'insert', 'pop',
'remove', 'reverse', 'sort']
>>> attributes(dict).keys()
['clear','copy','fromkeys', 'get', 'has_key', 'items','iteritems',
'iterkeys', 'itervalues', 'keys', 'pop', 'popitem', 'setdefault',
'update', 'values']
Classes and modules have a special attribute ``__dict__`` giving the
dictionary of their attributes. Since it is often a quite large dictionary,
it is convenient to define an utility function printing this dictionary in a
nice form:
::
#
def pretty(dic):
"Returns a nice string representation for the dictionary"
keys=dic.keys(); keys.sort() # sorts the keys
return '\n'.join(['%s = %s' % (k,dic[k]) for k in keys])
#
I encourage the use of this function in order to retrieve more
information about the modules of the standard library:
>>> from oopp import pretty
>>> import time #look at the 'time' standard library module
>>> print pretty(vars(time))
__doc__ = This module provides various functions to manipulate time values.
There are two standard representations of time. One is the number
of seconds since the Epoch, in UTC (a.k.a. GMT). It may be an integer
or a floating point number (to represent fractions of seconds).
The Epoch is system-defined; on Unix, it is generally January 1st, 1970.
The actual value can be retrieved by calling gmtime(0).
The other representation is a tuple of 9 integers giving local time.
The tuple items are:
year (four digits, e.g. 1998)
month (1-12)
day (1-31)
hours (0-23)
minutes (0-59)
seconds (0-59)
weekday (0-6, Monday is 0)
Julian day (day in the year, 1-366)
DST (Daylight Savings Time) flag (-1, 0 or 1)
If the DST flag is 0, the time is given in the regular time zone;
if it is 1, the time is given in the DST time zone;
if it is -1, mktime() should guess based on the date and time.
Variables:
timezone -- difference in seconds between UTC and local standard time
altzone -- difference in seconds between UTC and local DST time
daylight -- whether local time should reflect DST
tzname -- tuple of (standard time zone name, DST time zone name)
Functions:
time() -- return current time in seconds since the Epoch as a float
clock() -- return CPU time since process start as a float
sleep() -- delay for a number of seconds given as a float
gmtime() -- convert seconds since Epoch to UTC tuple
localtime() -- convert seconds since Epoch to local time tuple
asctime() -- convert time tuple to string
ctime() -- convert time in seconds to string
mktime() -- convert local time tuple to seconds since Epoch
strftime() -- convert time tuple to string according to format specification
strptime() -- parse string to time tuple according to format specification
__file__ = /usr/local/lib/python2.3/lib-dynload/time.so
__name__ = time
accept2dyear = 1
altzone = 14400
asctime =
clock =
ctime =
daylight = 1
gmtime =
localtime =
mktime =
newattr = hello!
sleep =
strftime =
strptime =
struct_time =
time =
timezone = 18000
tzname = ('EST', 'EDT')
The list of the built-in Python types can be found in the ``types`` module:
>>> import types
>>> t_dict=dict([(k,v) for (k,v) in vars(types).iteritems()
... if k.endswith('Type')])
>>> for t in t_dict: print t,
...
DictType IntType TypeType FileType CodeType XRangeType EllipsisType
SliceType BooleanType ListType MethodType TupleType ModuleType FrameType
StringType LongType BuiltinMethodType BufferType FloatType ClassType
DictionaryType BuiltinFunctionType UnboundMethodType UnicodeType
LambdaType DictProxyType ComplexType GeneratorType ObjectType
FunctionType InstanceType NoneType TracebackType
For a pedagogical account of the most elementary
Python introspection features,
Patrick O' Brien:
http://www-106.ibm.com/developerworks/linux/library/l-pyint.html
Built-in objects: iterators and generators
---------------------------------------------------------------------------
At the end of the last section , I have used the ``iteritems`` method
of the dictionary, which returns an iterator:
>>> dict.iteritems.__doc__
'D.iteritems() -> an iterator over the (key, value) items of D'
Iterators (and generators) are new features of Python 2.2 and could not be
familiar to all readers. However, since they are unrelated to OOP, they
are outside the scope of this book and will not be discussed here in detail.
Nevertheless, I will give a typical example of use of a generator, since
this construct will be used in future chapters.
At the syntactical level, a generator is a "function" with (at least one)
``yield`` statement (notice that in Python 2.2 the ``yield`` statement is
enabled trough the ``from __future__ import generators`` syntax):
::
#
import re
def generateblocks(regexp,text):
"Generator splitting text in blocks according to regexp"
start=0
for MO in regexp.finditer(text):
beg,end=MO.span()
yield text[start:beg] # actual text
yield text[beg:end] # separator
start=end
lastblock=text[start:]
if lastblock: yield lastblock; yield ''
#
In order to understand this example, the reader my want to refresh his/her
understanding of regular expressions; since this is not a subject for
this book, I simply remind the meaning of ``finditer``:
>>> import re
>>> help(re.finditer)
finditer(pattern, string)
Return an iterator over all non-overlapping matches in the
string. For each match, the iterator returns a match object.
Empty matches are included in the result.
Generators can be thought of as resumable functions that stop at the
``yield`` statement and resume from the point where they left.
>>> from oopp import generateblocks
>>> text='Python_Rules!'
>>> g=generateblocks(re.compile('_'),text)
>>> g
>>> dir(g)
['__class__', '__delattr__', '__doc__', '__getattribute__', '__hash__',
'__init__', '__iter__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__str__', 'gi_frame', 'gi_running', 'next']
Generator objects can be used as iterators in a ``for`` loop.
In this example the generator takes a text and a regular expression
describing a fixed delimiter; then it splits the text in blocks
according to the delimiter. For instance, if the delimiter is
'_', the text 'Python Rules!' is splitted as 'Python', '_' and 'Rules!':
>>> for n, block in enumerate(g): print n, block
...
0 Python
1
2 Rules!
3
This example also show the usage of the new Python 2.3 built-in ``enumerate``.
Under the hood the ``for`` loop is calling the generator via its
``next`` method, until the ``StopIteration`` exception is raised.
For this reason a new call to the ``for`` loop will have no effect:
>>> for n, block in enumerate(g): print n, block
...
The point is that the generator has already yield its last element:
>>> g.next() # error
Traceback (most recent call last):
File "", line 1, in ?
StopIteration
``generateblocks`` always returns an even number of blocks; odd blocks
are delimiters whereas even blocks are the intertwining text; there may be
empty blocks, corresponding to the null string ''.
It must be remarked the difference with the 'str.split' method
>>> 'Python_Rules!'.split('_')
['Python', 'Rules!']
and the regular expression split method:
>>> re.compile('_').split('Python_Rules!')
['Python', 'Rules!']
both returns lists with an odd number of elements and both miss the separator.
The regular expression split method can catch the separator, if wanted,
>>> re.compile('(_)').split('Python_Rules!')
['Python', '_', 'Rules!']
but still is different from the generator, since it returns a list. The
difference is relevant if we want to split a very large text, since
the generator avoids to build a very large list and thus it is much more
memory efficient (it is faster, too). Moreover, ``generateblocks``
works differently in the case of multiple groups:
>>> delim=re.compile('(_)|(!)') #delimiter is space or exclamation mark
>>> for n, block in enumerate(generateblocks(delim,text)):
... print n, block
0 Python
1 _
2 Rules
3 !
whereas
>>> delim.split(text)
['Python', '_', None, 'Rules', None, '!', '']
gives various unwanted ``None`` (which could be skipped with
``[x for x in delim.split(text) if x is not None]``); notice, that
there are no differences (apart from the fact that ``delim.split(text)``
has an odd number of elements) when one uses a single group regular expression:
>>> delim=re.compile('(_|!)')
>>> delim.split(text)
['Python', '_', 'Rules', '!', '']
The reader unfamiliar with iterators and generators is encouraged
to look at the standard documentation and other
references. For instance, there are Alex Martelli's notes on iterators at
http://www.strakt.com/dev_talks.html
and there is a good article on generators by David Mertz
http://www-106.ibm.com/developerworks/linux/library/l-pycon.html
THE CONVENIENCE OF FUNCTIONS
============================================================================
Functions are the most basic Python objects. They are also the simplest
objects where one can apply the metaprogramming techniques that are
the subject of this book. The tricks used in this chapter and the utility
functions defined here will be used over all the book. Therefore this
is an *essential* chapter.
Since it is intended to be a gentle introduction, the tone will be
informal.
Introduction
-------------
One could be surprised that a text on OOP begins with a chapter on the
well known old-fashioned functions. In some sense, this is also
against the spirit of an important trend in OOP, which tries to
shift the focus from functions to data. In pure OOP languages,
there are no more functions, only methods. [#]_
However, there are good reasons for that:
1. In Python, functions *are* objects. And particularly useful ones.
2. Python functions are pretty powerful and all their secrets are probably
*not* well known to the average Python programmer.
3. In the solutions of many problems, you don't need the full apparatus
of OOP: good old functions can be enough.
Moreover, I am a believer in the multiparadigm approach to programming,
in which you choose your tools according to your problem.
With a bazooka you can kill a mosquito, yes, but this does not mean
that you must use the bazooka *always*.
In certain languages, you have no choice, and you must define
a class (involving a lot of boiler plate code) even for the most trivial
application. Python's philosophy is to keep simple things simple, but
having the capability of doing even difficult things with a reasonable
amount of effort. The message of this chapter will be: "use functions when
you don't need classes". Functions are good because:
1. They are easy to write (no boiler plate);
2. They are easy to understand;
3. They can be reused in your code;
4. Functions are an essential building block in the construction of objects.
Even if I think that OOP is an extremely effective strategy, with
enormous advantages on design, maintanibility and reusability of code,
nevertheless this book is *not* intended to be a panegyric of OOP. There
are cases in which you don't need OOP. I think the critical parameter is
the size of the program. These are the rules I follows usually (to be
taken as indicative):
1. If I have to write a short script of 20-30 lines, that copies two or
three files and prints some message, I use fast and dirty spaghetti-code;
there is no use for OOP.
2. If your script grows to one-hundred lines or more, I structure
it write a few routines and a main program: but still I can live
without OOP.
3. If the script goes beyond the two hundred lines, I start
collecting my routines in few classes.
4. If the script goes beyond the five hundred lines, I split the program
in various files and modules and convert it to a package.
5. I never write a function longer than 50 lines, since 50 lines is more
or less the size of a page in my editor, and I need to be able to
see the entire function in a page.
Of course your taste could be different and you could prefer to write a
monolitic program of five thousand lines; however the average size of
the modules in the Python standard library is of 111 lines.
I think this is a *strong* suggestion towards
a modular style of programming, which
is *very* well supported in Python.
The point is that OOP is especially useful for *large* programs: if you
only use Python for short system administration scripts you may well
live without OOP. Unfortunaly, as everybody knows, short scripts have
an evil tendency to become medium size scripts, and medium size scripts
have the even more evil tendency to become large scripts and possible
even full featured applications ! For this reason it is very probable
that at a certain moment you will feel the need for OOP.
I remember my first big program, a long time ago: I wrote a program
to draw mathematical functions in AmigaBasic. It was good and nice
until it had size of few hundred lines; but when it passed a thousand
of lines, it became rapidly unmanageable and unmaintenable. There where
three problems:
1. I could not split the program in modules, as I wanted, due to the
limitations of AmigaBasic;
2. I was missing OOP to keep the logic of the program all together, but
at the time I didn't know that;
3. I was missing effective debugging techniques.
4. I was missing effective refactoring tools.
I am sure anybody who has ever written a large program has run in these
limitations: and the biggest help of OOP is in overcoming these limitations.
Obviously, miracles are impossible, and even object oriented programs can
grow to a size where they become unmaintanable: the point is that the
critical limit is much higher than the thousand lines of structured programs.
I haven't yet reached the limit of unmanageability with Python. The fact
that the standard library is 66492 lines long (as result from the total
number of lines in ``/usr/local/lib/python2.2/``), but it is still manageable,
give me an hope ;-)
.. [#] However, one could argue that having functions distinguished from
methods is the best thing to do, even in a strongly object-oriented
world. For instance, generic functions can be used to implement
multimethods. See for instance Lisp, Dylan and MultiJava. This latter
is forced to introduce the concept of function outside a class,
foreign to traditional Java, just to implement multimethods.
A few useful functions
------------------------------------------------------------------------------
It is always a good idea to have a set of useful function collected in
a user defined module. The first function we want to have in our module
is the ``do_nothing`` function:
::
#
def do_nothing(*args,**kw): pass
#
This function accept a variable number of arguments and keywords (I
defer the reader to the standard documentation if she is unfamiliar
with these concept; this is *not* another Python tutorial ;-) and
return ``None``. It is very useful for debugging purposes, when in a
complex program you may want concentrate your attention to few crucial
functions and set the non-relevant functions to ``do_nothing`` functions.
A second function which is useful in developing programs is a timer
function. Very ofter indeed, we may want to determine the bottleneck
parts of a program, we are interested in profiling them and in seeing
if we can improve the speed by improving the algorithm, or by using
a Python "compiler" such as Psyco, or if really we need to write a C
extension. In my experience, I never needed to write a C extension,
since Python is fast enough. Nevertheless, to profile a program is
always a good idea and Python provides a profiler module in the
stardard library with this aim. Still, it is convenient to have
a set of user defined functions to test the execution speed of
few selected routines (whereas the standard profiler profiles everything).
We see from the standard library documentation that
the current time can be retrieved from the ``time`` module: [#]_
>>> import time
>>> time.asctime()
'Wed Jan 15 12:46:03 2003'
Since we are not interested in the date but only in the time, we need
a function to extract it. This is easily implemented:
::
#
import time
def get_time():
"Return the time of the system in the format HH:MM:SS"
return time.asctime().split()[3]
#
>>> from oopp import get_time
>>> get_time()
'13:03:49'
Suppose, for instance, we want to know how much it takes to Python
to write a Gigabyte of data. This can be a quite useful benchmark
to have an idea of the I/O bottlenecks in our system. Since to take in memory
a file of a Gigabyte can be quite problematic, let me compute the
time spent in writing 1024 files of one Megabyte each. To this
aim we need a ``writefile`` function
::
#
def writefile(fname,data):
f=file(fname,'w')
f.write(data)
f.close()
#
and timing function. The idea is to wrap the ``writefile`` function in
a ``with_clock`` function as follows:
::
#
def with_clock(func,n=1):
def _(*args,**kw): # this is a closure
print "Process started on",get_time()
print ' .. please wait ..'
for i in range(n): func(*args,**kw)
print "Process ended on",get_time()
return _
#
The wrapper function ``with_clock`` has converted the function ``writefile``
in a function ``with_clock(writefile)`` which has the same arguments
of ``writefile``, but contains additional features: in this case
timing capabilities. Technically speaking, the internal function ``_``
is called a *closure*. Closures are very common in functional languages
and can be used in Python too, with very little effort [#]_.
I will use closures very often in the following, and I will use
the convention of denoting with "_" the inner
function in the closure, since there is no reason of giving to it a
descriptive name (the name 'with_clock' in the outer function
is descriptive enough). For the same, reason I do not use a
docstring for "_". If Python would allow multistatement lambda
functions, "_" would be a good candidate for an anonymous function.
Here is an example of usage:
>>> from oopp import *
>>> data='*'*1024*1024 #one megabyte
>>> with_clock(writefile,n=1024)('datafile',data) #.
Process started on 21:20:01
.. please wait ..
Process ended on 21:20:57
This example shows that Python has written one Gigabyte of data (splitted in
1024 chunks of one Megabyte each) in less than a minute. However,the
result depends very much on the filesystem. I always suggest people
to profile their programs, since one *always* find surprises.
For instance, I have checked the performance of my laptop,
a dual machine Windows 98 SE/ Red Hat Linux 7.3.
The results are collected in the following table:
================= ===================== ========================
Laptop
Linux ext-3 FAT under Linux FAT under Windows 98
================= ===================== ========================
24-25 s 56-58 s 86-88 s
================= ===================== ========================
We see that Linux is *much* faster: more than three times faster than
Windows, using the same machine! Notice that the FAT filesystem under
Linux (where it is *not* native) is remarkably faster than the FAT
under Windows 98, where it is native !! I think that now my readers
can begin to understand why this book has been written under Linux
and why I *never* use Windows for programming (actually I use it only
to see the DVD's ;-).
I leave as an exercise for the reader to check the results on this
script on their machine. Since my laptop is quite old, you will probably
have much better performances (for instance on my linux desktop I can
write a Gigabyte in less than 12 seconds!). However, there are *always*
surprises: my desktop is a dual Windows 2000 machine with three different
filesystems, Linux ext-2, FAT and NTFS. Surprisingly enough, the NT
filesystem is the more inefficient for writing, *ten times slower*
than Linux!
================= ===================== ========================
Desktop
Linux ext-2 FAT under Win2000 NTFS under Win2000
================= ===================== ========================
11-12 s 95-97 s 117-120 s
================= ===================== ========================
.. [#] Users of Python 2.3 can give a look to the new ``datetime`` module,
if they are looking for a sophisticated clock/calendar.
.. [#] There are good references on functional programming in Python;
I suggest the Python Cookbook and the articles by David Mertz
www.IBM.dW.
Functions are objects
---------------------------------------------------------------------------
As we said in the first chapter, objects have attributes accessible with the
dot notation. This is not surprising at all. However, it could be
surprising to realize that since Python functions are objects, they
can have attributes, too. This could be surprising since this feature is quite
uncommon: typically or i) the language is
not object-oriented, and therefore functions are not objects, or ii)
the language is strongly object-oriented and does not have functions, only
methods. Python is a multiparadigm language (which I prefer to the
term "hybrid" language), therefore it has functions that are objects,
as in Lisp and other functional languages.
Consider for instance the ``get_time`` function.
That function has at least an useful attribute, its doctring:
>>> from oopp import get_time
>>> print get_time.func_doc
Return the time of the system in the format HH:MM:SS
The docstring can also be obtained with the ``help`` function:
>>> help(get_time)
Help on function get_time in module oopp:
get_time()
Return the time of the system in the format HH:MM:SS
Therefore ``help`` works on user-defined functions, too, not only on
built-in functions. Notice that ``help`` also returns the argument list of
the function. For instance, this is
the help message on the ``round`` function that we will use in the
following:
>>> help(round)
Help on built-in function round:
round(...)
round(number[, ndigits]) -> floating point number
Round a number to a given precision in decimal digits (default 0
digits).This always returns a floating point number. Precision may
be negative.
I strongly recommend Python programmers to use docstrings, not
only for clarity sake during the development, but especially because
it is possible to automatically generate nice HTML documentation from
the docstrings, by using the standard tool "pydoc".
One can easily add attributes to a function. For instance:
>>> get_time.more_doc='get_time invokes the function time.asctime'
>>> print get_time.more_doc
get_time invokes the function time.asctime
Attributes can be functions, too:
>>> def IamAfunction(): print "I am a function attached to a function"
>>> get_time.f=IamAfunction
>>> get_time.f()
I am a function attached to a function
This is a quite impressive potentiality of Python functions, which has
no direct equivalent in most other languages.
One possible application is to fake C "static" variables. Suppose
for instance we need a function remembering how may times it is
called: we can simply use
::
#
def double(x):
try: #look if double.counter is defined
double.counter
except AttributeError:
double.counter=0 #first call
double.counter+=1
return 2*x
double(double(2))
print "double has been called %s times" % double.counter
#
with output ``double has been called 2 times``.
A more elegant approach involves closures. A closure can enhance an
ordinary function, providing to it the capability of remembering
the results of its previous calls and avoiding the duplication of
computations:
::
#
def withmemory(f):
"""This closure invokes the callable object f only if need there is"""
argskw=[]; result=[]
def _(*args,**kw):
akw=args,kw
try: # returns a previously stored result
i=argskw.index(akw)
except ValueError: # there is no previously stored result
res=f(*args,**kw) # returns the new result
argskw.append(akw) # update argskw
result.append(res) # update result
return res
else:
return result[i]
_.argskw=argskw #makes the argskw list accessible outside
_.result=result #makes the result list accessible outside
return _
def memoize(f):
"""This closure remembers all f invocations"""
argskw,result = [],[]
def _(*args,**kw):
akw=args,kw
try: # returns a previously stored result
return result[argskw.index(akw)]
except ValueError: # there is no previously stored result
argskw.append(akw) # update argskw
result.append(f(*args,**kw)) # update result
return result[-1] # return the new result
_.argskw=argskw #makes the argskw list accessible outside
_.result=result #makes the result list accessible outside
return _
#
Now, if we call the wrapped function ``f`` twice with the same arguments,
Python can give the result without repeating the (possibly very long)
computation.
>>> def f(x):
... print 'called f'
... return x*x
>>> wrapped_f=withmemory(f)
>>> wrapped_f(2) #first call with the argument 2; executes the computation
called f
4
>>> wrapped_f(2) #does not repeat the computation
4
>>> wrapped_f.result
[4]
>>> wrapped_f.argskw
[((2,), {})]
Profiling functions
---------------------------------------------------------------------------
The ``with_clock`` function provided before was intended to be
pedagogical; as such it is a quite poor solution to the
problem of profiling a Python routine. A better solution involves
using two others functions in the time library, ``time.time()``
that gives that time in seconds elapsed from a given date, and
``time.clock()`` that gives the time spent by the CPU in a given
computation. Notice that ``time.clock()`` has not an infinite
precision (the precision depends on the system) and one
should expect relatively big errors if the function runs in
a very short time. That's the reason why it is convenient
to execute multiple times short functions and divide the total
time by the number of repetitions. Moreover, one should subtract the
overhead do to the looping. This can be computed with the following
routine:
::
#
def loop_overhead(N):
"Computes the time spent in empty loop of N iterations"
t0=time.clock()
for i in xrange(N): pass
return time.clock()-t0
#
For instance, on my laptop an empty loop of one million of iterations
is performed in 1.3 seconds. Typically the loop overhead is negligible,
whereas the real problem is the function overhead.
Using the attribute trick discussed above, we may
define a ``with_timer`` function that enhances quite a bit
``with_clock``:
::
#
def with_timer(func, modulename='__main__', n=1, logfile=sys.stdout):
"""Wraps the function func and executes it n times (default n=1).
The average time spent in one iteration, express in milliseconds,
is stored in the attributes func.time and func.CPUtime, and saved
in a log file which defaults to the standard output.
"""
def _(*args,**kw): # anonymous function
time1=time.time()
CPUtime1=time.clock()
print 'Executing %s.%s ...' % (modulename,func.__name__),
for i in xrange(n): res=func(*args,**kw) # executes func n times
time2=time.time()
CPUtime2=time.clock()
func.time=1000*(time2-time1)/n
func.CPUtime=1000*(CPUtime2-CPUtime1-loop_overhead(n))/n
if func.CPUtime<10: r=3 #better rounding
else: r=1 #default rounding
print >> logfile, 'Real time: %s ms' % round(func.time,r),
print >> logfile, ' CPU time: %s ms' % round(func.CPUtime,r)
return res
return _
#
Here it is an example of application:
>>> from oopp import with_timer,writefile
>>> data='*'*1024*1024 #one megabyte
>>> with_timer(writefile,n=1024)('datafile',data) #.
Executing writefile ... Real time: 60.0 ms CPU time: 42.2 ms
The CPU time can be quite different from the real time,
as you can see in the following example:
>>> import time
>>> def sleep(): time.sleep(1)
...
>>> with_timer(sleep)() #.
Executing sleep ... Real time: 999.7 ms CPU time: 0.0 ms
We see that Python has run for 999.7 ms (i.e. 1 second, up to
approximation errors in the system clock) during which the CPU has
worked for 0.0 ms (i.e. the CPU took a rest ;-).
The CPU time is the relevant time to use with the purpose of
benchmarking Python speed.
I should notice that the approach pursued in ``with_timer`` is still
quite simple. A better approach would be to
plot the time versus the number of iteration, do a linear interpolation
and extract the typical time for iteration from that. This allows
to check visually that the machine is not doing something strange
during the execution time and it is what
I do in my personal benchmark routine; doing something similar is
left as an exercise for the reader ;-).
Another approach is to use the ``timeit.py`` module (new in Python 2.3,
but works also with Python 2.2):
::
#
import timeit,__main__,warnings
warnings.filterwarnings('ignore',
'import \* only allowed at module level',SyntaxWarning)
def timeit_(stmt,setup='from __main__ import *',n=1000):
t=timeit.Timer(stmt,setup)
try: print t.repeat(number=n) # class timeit 3 times
except: t.print_exc()
#
It is often stated that Python is slow and quite ineffective
in application involving hard computations. This is generally speaking
true, but how bad is the situation ? To test the (in)efficiency of
Python on number crunching, let me give a function to compute the
Mandelbrot set, which I have found in the Python Frequently Asked
Question (FAQ 4.15. *Is it possible to write obfuscated one-liners
in Python?*).
This function is due to Ulf Bartelt and you should ask him to know how
does it work ;-)
::
#
def mandelbrot(row,col):
"Computes the Mandelbrot set in one line"
return (lambda Ru,Ro,Iu,Io,IM,Sx,Sy:reduce(
lambda x,y:x+y,map(lambda y,Iu=Iu,Io=Io,Ru=Ru,Ro=Ro,Sy=Sy,L=
lambda yc,Iu=Iu,Io=Io,Ru=Ru,Ro=Ro,i=IM, Sx=Sx,Sy=Sy:reduce(
lambda x,y:x+y,map(lambda x,xc=Ru,yc=yc,Ru=Ru,Ro=Ro, i=i,
Sx=Sx,F=lambda xc,yc,x,y,k,f=lambda xc,yc,x,y,k,f:(k<=0)
or (x*x+y*y>=4.0) or 1+f(xc,yc,x*x-y*y+xc,2.0*x*y+yc,k-1,f):
f(xc,yc,x,y,k,f):chr(64+F(Ru+x*(Ro-Ru)/Sx,yc,0,0,i)),
range(Sx))):L(Iu+y*(Io-Iu)/Sy),range(Sy))))(
-2.1, 0.7, -1.2, 1.2, 30, col, row)
# \___ ___/ \___ ___/ | | |_ lines on screen
# V V | |______ columns on screen
# | | |__________ maximum of "iterations"
# | |_________________ range on y axis
# |____________________________ range on x axis
#
Here there is the benchmark on my laptop:
>>> from oopp import mandelbrot,with_timer
>>> row,col=24,75
>>> output=with_timer(mandelbrot,n=1)(row,col)
Executing __main__.mandelbrot ... Real time: 427.9 ms CPU time: 410.0 ms
>>> for r in range(row): print output[r*col:(r+1)*col]
...
BBBBBBBBBBBBBBCCCCCCCCCCCCDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDCCCCCCCCCCCCCC
BBBBBBBBBBBBCCCCCCCCCDDDDDDDDDDDDDDDDDDDDDDEEEEEEFGYLFFFEEEEEDDDDDCCCCCCCCC
BBBBBBBBBBCCCCCCCDDDDDDDDDDDDDDDDDDDDDEEEEEEEEEFFFGIKNJLLGEEEEEEDDDDDDCCCCC
BBBBBBBBBCCCCCDDDDDDDDDDDDDDDDDDDDDEEEEEEEEEFFFFGHJJR^QLIHGFFEEEEEEDDDDDDCC
BBBBBBBBCCCDDDDDDDDDDDDDDDDDDDDDEEEEEEEEEFFFGGGHIK_______LHGFFFFFEEEEDDDDDD
BBBBBBBCCDDDDDDDDDDDDDDDDDDDDEEEEEEEFFFGHILIIIJJKMS_____PLJJIHGGGHJFEEDDDDD
BBBBBBCDDDDDDDDDDDDDDDDDDEEEEEFFFFFFGGGHMQ__T________________QLOUP[OGFEDDDD
BBBBBCDDDDDDDDDDDDDDDEEEFFFFFFFFFGGGGHJNM________________________XLHGFFEEDD
BBBBCDDDDDDDDDEEEEEFFGJKHHHHHHHHHHHHIKN[__________________________MJKGFEEDD
BBBBDDDDEEEEEEEEFFFFGHIKPVPMNU_QMJJKKZ_____________________________PIGFEEED
BBBCDEEEEEEEEFFFFFFHHHML___________PQ_______________________________TGFEEEE
BBBDEEEEEEFGGGGHHHJPNQP^___________________________________________IGFFEEEE
BBB_____________________________________________________________OKIHGFFEEEE
BBBDEEEEEEFGGGGHHHJPNQP^___________________________________________IGFFEEEE
BBBCDEEEEEEEEFFFFFFHHHML___________PQ_______________________________TGFEEEE
BBBBDDDDEEEEEEEEFFFFGHIKPVPMNU_QMJJKKZ_____________________________PIGFEEED
BBBBCDDDDDDDDDEEEEEFFGJKHHHHHHHHHHHHIKN[__________________________MJKGFEEDD
BBBBBCDDDDDDDDDDDDDDDEEEFFFFFFFFFGGGGHJNM________________________XLHGFFEEDD
BBBBBBCDDDDDDDDDDDDDDDDDDEEEEEFFFFFFGGGHMQ__T________________QLOUP[OGFEDDDD
BBBBBBBCCDDDDDDDDDDDDDDDDDDDDEEEEEEEFFFGHILIIIJJKMS_____PLJJIHGGGHJFEEDDDDD
BBBBBBBBCCCDDDDDDDDDDDDDDDDDDDDDEEEEEEEEEFFFGGGHIK_______LHGFFFFFEEEEDDDDDD
BBBBBBBBBCCCCCDDDDDDDDDDDDDDDDDDDDDEEEEEEEEEFFFFGHJJR^QLIHGFFEEEEEEDDDDDDCC
BBBBBBBBBBCCCCCCCDDDDDDDDDDDDDDDDDDDDDEEEEEEEEEFFFGIKNJLLGEEEEEEDDDDDDCCCCC
BBBBBBBBBBBBCCCCCCCCCDDDDDDDDDDDDDDDDDDDDDDEEEEEEFGYLFFFEEEEEDDDDDCCCCCCCCC
I am willing to concede that this code is not typical Python code and
actually it could be an example of *bad* code, but I wanted a nice ASCII
picture on my book ... :) Also, this prove that Python is not necessarily
readable and easy to understand ;-)
I leave for the courageous reader to convert the previous algorithm to C and
measure the difference in speed ;-)
About Python speed
---------------------------------------------------
The best way to improved the speed is to improve the algorithm; in
this sense Python is an ideal language since it allows you to test
many algorithms in an incredibly short time: in other words, the time you
would spend fighting with the compiler in other languages, in Python
can be used to improve the algorithm.
However in some cases, there is little to do: for instance, in many
problems one has to run lots of loops, and Python loops are horribly
inefficients as compared to C loops. In this case the simplest possibility
is to use Psyco. Psyco is a specialing Python compiler written by Armin
Rigo. It works for 386 based processors and allows Python to run loops at
C speed. Installing Psyco requires $0.00 and ten minutes of your time:
nine minutes to find the program, download it, and install it; one
minute to understand how to use it.
The following script explains both the usage and the advantages of Psyco:
::
#
import oopp,sys
try:
import psyco
except ImportError:
print "Psyco is not installed, sorry."
else:
n=1000000 # 1,000,000 loops
without=oopp.loop_overhead(n)
print "Without Psyco:",without
psyco.bind(oopp.loop_overhead) #compile the empty_loop
with=oopp.loop_overhead(n)
print "With Psyco:",with
print 'Speedup = %sx' % round(without/with,1)
#
The output is impressive:
::
Without Psyco: 1.3
With Psyco: 0.02
Speedup = 65.0x
Notice that repeating the test, you will obtain different speedups.
On my laptop, the speedup for an empty loop of 10,000,000 of
iteration is of the order of 70x, which is the same speed of a C loop,
actually (I checked it). On my desktop, I have even found a speedup of
94x !
However, I must say that Psyco has some limitations. The problem is
the function call overhead. Psyco enhances the overhead and in some
programs it can even *worsen* the performance (this is way you should
*never* use the ``psyco.jit()`` function that wraps all the functions of
your program: you should only wrap the bottleneck loops). Generally speaking,
you should expect a much more modest improvement, a factor of 2 or 3
is what I obtain usually in my programs.
Look at this second example, which essentially measure the function
call overhead by invoking the ``do_nothing`` function:
::
#
import oopp
try:
import psyco
except ImportError:
print "Psyco is not installed, sorry."
else:
n=10000 # 10,000 loops
def do_nothing_loop():
for i in xrange(n): oopp.do_nothing()
print "Without Psyco:\n"
oopp.with_timer(do_nothing_loop,n=5)() #50,000 times
without=do_nothing_loop.CPUtime
psyco.bind(do_nothing_loop)
print "With Psyco:\n"
oopp.with_timer(do_nothing_loop,n=5)() #50,000 times
with=do_nothing_loop.CPUtime
print 'Speedup = %sx' % round(without/with,1)
#
The output is less incredible:
::
Without Psyco:
Executing do_nothing_loop ... Real time: 138.2 ms CPU time: 130.0 ms
With Psyco:
Executing do_nothing_loop ... Real time: 70.0 ms CPU time: 68.0 ms
Speedup = 1.9x
However, this is still impressive, if you think that you can double
the speed of your program by adding *a line* of code! Moreover this
example is not fair since Psyco cannot improve very much the performance
for loops invoking functions with a variable number of arguments. On the
other hand, it can do quite a lot for loops invoking functions with
a fixed number of arguments. I have checked that you can easily reach
speedups of 20x (!). The only disadvantage is that a program invoking
Psyco takes much more memory, than a normal Python program, but this
is not a problem for most applications in nowadays computers.
Therefore, often Psyco
can save you the effort of going trough a C extension. In some cases,
however, there is no hope: I leave as an exercise for the reader
to check (at least the version 0.4.1 I am using now) is unable to
improve the performance on the Mandelbrot set example. This proves
that in the case bad code, there is no point in using a compiler:
you have to improve the algorithm first !
By the way, if you really want to go trough a C extension with a minimal
departure from Python, you can use Pyrex by Greg Ewing. A Pyrex program
is essentially a Python program with variable declarations that is
automatically converted to C code. Alternatively, you can inline
C functions is Python with ``weave`` of ...
Finally, if you want to access C/C++ libraries, there tools
like Swig, Booster and others.
Tracing functions
---------------------------------------------------------------------------
Typically, a script contains many functions that call themselves each
other when some conditions are satisfied. Also, typically during
debugging things do not work the way we would like and it is not
clear which functions are called, in which order they are called,
and which parameters are passed. The best way to know all these
informations, is to trace the functions in our script, and to write
all the relevant informations in a log file. In order to keep the
distinction between the traced functions and the original one, it
is convenient to collect all the wrapped functions in a separate dictionary.
The tracing of a single function can be done with a closure
like this:
::
#
def with_tracer(function,namespace='__main__',output=sys.stdout, indent=[0]):
"""Closure returning traced functions. It is typically invoked
trough an auxiliary function fixing the parameters of with_tracer."""
def _(*args,**kw):
name=function.__name__
i=' '*indent[0]; indent[0]+=4 # increases indentation
output.write("%s[%s] Calling '%s' with arguments\n" %
(i,namespace,name))
output.write("%s %s ...\n" % (i,str(args)+str(kw)))
res=function(*args,**kw)
output.write("%s[%s.%s] called with result: %s\n"
% (i,namespace,name,str(res)))
indent[0]-=4 # restores indentation
return res
return _ # the traced function
#
Here is an example of usage:
>>> from oopp import with_tracer
>>> def fact(n): # factorial function
... if n==1: return 1
... else: return n*fact(n-1)
>>> fact=with_tracer(fact)
>>> fact(3)
[__main__] Calling 'fact' with arguments
(3,){} ...
[__main__] Calling 'fact' with arguments
(2,){} ...
[__main__] Calling 'fact' with arguments
(1,){} ...
[__main__.fact] called with result: 1
[__main__.fact] called with result: 2
[__main__.fact] called with result: 6
6
The logic behind ``with_tracer`` should be clear; the only trick is the
usage of a default list as a way to store a global indentation parameter.
Since ``indent`` is mutable, the value of ``indent[0]`` changes at any
recursive call of the traced function, resulting in a nested display.
Typically, one wants to trace all the functions in a given module;
this can be done trough the following function:
::
#
from types import *
isfunction=lambda f: isinstance(f,(FunctionType,BuiltinFunctionType))
def wrapfunctions(obj,wrapper,err=None,**options):
"Traces the callable objects in an object with a dictionary"
namespace=options.get('namespace',getattr(obj,'__name__',''))
output=options.get('output',sys.stdout)
dic=dict([(k,wrapper(v,namespace,output))
for k,v in attributes(obj).items() if isfunction(v)])
customize(obj,err,**dic)
#
Notice that 'wrapfunctions' accepts as first argument an object with
a ``__dict__`` attribute (such as a module or a class) or with some
explicit attributes (such as a simple object) and modifies it. One can
trace a module as in this example:
::
#
import oopp,random
oopp.wrapfunctions(random,oopp.with_tracer)
random.random()
#
with output
::
[random] Calling 'random' with arguments
(){} ...
-> 'random.random' called with result: 0.175450439202
The beauty of the present approach is its generality: 'wrap' can be
used to add any kind of capabilities to a pre-existing module.
For instance, we could time the functions in a module, with the
purpose of looking at the bottlenecks. To this aim, it is enough
to use a 'timer' nested closure:
An example of calling is ``wrapfunction(obj,timer,iterations=1)``.
We may also compose our closures; for instance one could define a
``with_timer_and_tracer`` closure:
>>> with_timer_and_tracer=lambda f: with_timer(with_tracer(f))
It should be noticed that Python comes with a standard profiler
(in my system it is located in ``/usr/local/lib/python2.2/profile.py``)
that allows to profile a script or a module (try
python /usr/local/lib/python2.2/profile.py oopp.py)
or
>>> import profile; help(profile)
and see the on-line documentation.
Tracing objects
----------------------------------------------------------------------
In this section, I will give a more sophisticated example, in which
one can easily understand why the Python ability of changing methods and
attributes during run-time, is so useful.
As a preparation to the real example, let me
first introduce an utility routine that allows the user
to add tracing capabilities to a given object.
Needless to say, this feature can be invaluable during debugging, or in trying
to understand the behaviour of a program written by others.
This routine is a little complex and needs some explanation.
1. The routine looks in the attributes of the object and try to access them.
2. If the access is possible, the routines looks for methods (methods
are recognized trough the ``inspect.isroutine`` function in the
standard library) and ignores regular attributes;
3. The routine try to override the original methods with improved ones,
that possess tracing capabilities;
4. the traced method is obtained with the wrapping trick discussed before.
I give now the real life example that I have anticipated before.
Improvements and elaborations of this example can be useful to the
professional programmer, too. Suppose you have an XML text you want
to parse. Python provides excellent support for this kind of operation
and various standard modules. One of the most common is the ``expat``
module (see the standard library documentation for more).
If you are just starting using the module, it is certainly useful
to have a way of tracing its behaviour; this is especially true if
you you find some unexpected error during the parsing of a document
(and this may happens even if you are an experience programmer ;-).
The tracing routine just defined can be used to trace the parser, as
it is exemplified in the following short script:
::
#
import oopp, xml.parsers.expat, sys
# text to be parsed
text_xml="""\
Text goes here
"""
# a few do nothing functions
def start(*args): pass
def end(*args): pass
def handler(*args): pass
# a parser object
p = xml.parsers.expat.ParserCreate()
p.StartElementHandler = start
p.EndElementHandler = end
p.CharacterDataHandler = handler
#adds tracing capabilities to p
oopp.wrapfunctions(p,oopp.with_tracer, err=sys.stdout)
p.Parse(text_xml)
#
The output is:
::
Error: SetBase cannot be set
Error: Parse cannot be set
Error: ParseFile cannot be set
Error: GetBase cannot be set
Error: SetParamEntityParsing cannot be set
Error: ExternalEntityParserCreate cannot be set
Error: GetInputContext cannot be set
[] Calling 'start' with arguments
(u'parent', {u'id': u'dad'}){} ...
[.start] called with result: None
[] Calling 'handler' with arguments
(u'\n',){} ...
[.handler] called with result: None
[] Calling 'start' with arguments
(u'child', {u'name': u'kid'}){} ...
[.start] called with result: None
[] Calling 'handler' with arguments
(u'Text goes here',){} ...
[.handler] called with result: None
[] Calling 'end' with arguments
(u'child',){} ...
[.end] called with result: None
[] Calling 'handler' with arguments
(u'\n',){} ...
[.handler] called with result: None
[] Calling 'end' with arguments
(u'parent',){} ...
[.end] called with result: None
This is a case where certain methods cannot be managed with
``getattr/setattr``, because they are internally coded in C: this
explain the error messages at the beginning. I leave as an exercise
for the reader to understand the rest ;-)
Inspecting functions
----------------------------------------------------------------------
Python wonderful introspection features are really impressive when applied
to functions. It is possible to extract a big deal of informations
from a Python function, by looking at its associated *code object*.
For instance, let me consider my, ``do_nothing`` function: its associated
code object can be extracted from the ``func_code`` attribute:
>>> from oopp import *
>>> co=do_nothing.func_code # extracts the code object
>>> co
>>> type(co)
The code object is far being trivial: the docstring says it all:
>>> print type(co).__doc__
code(argcount, nlocals, stacksize, flags, codestring, constants, names,
varnames, filename, name, firstlineno, lnotab[, freevars[, cellvars]])
Create a code object. Not for the faint of heart.
In the case of my ``do_nothing`` function, the code object
possesses the following attributes:
>>> print pretty(attributes(co))
co_argcount = 0
co_cellvars = ()
co_code = dS
co_consts = (None,)
co_filename = oopp.py
co_firstlineno = 48
co_flags = 15
co_freevars = ()
co_lnotab =
co_name = do_nothing
co_names = ()
co_nlocals = 2
co_stacksize = 1
co_varnames = ('args', 'kw')
Some of these arguments are pretty technical and implementation dependent;
however, some of these are pretty clear and useful:
- co_argcount is the total number of arguments
- co_filename is the name of the file where the function is defined
- co_firstlineno is the line number where the function is defined
- co_name is the name of the function
- co_varnames are the names
The programmer that it is not a "faint of heart" can study
the built-in documentation on code objects; s/he should try
::
for k,v in attributes(co).iteritems(): print k,':',v.__doc__,'\n'
# does not work now !!
::
add=[lambda x,i=i: x+i for i in range(10)]
>>> def f(y):
... return lambda x: x+y
...
>>> f(1).func_closure #closure cell object
(,)
func.defaults, closure, etc.
#how to extract (non-default) arguments as help does.
print (lambda:None).func_code.co_filename
One cannot change the name of a function:
>>> def f(): pass
...
>>> f.__name__='ciao' # error
Traceback (most recent call last):
File "", line 1, in ?
TypeError: readonly attribute
However, one can create a copy with a different name:
::
#
def copyfunc(f,newname=None): # works under Python 2.3
if newname is None: newname=f.func_name # same name
return FunctionType(f.func_code, globals(), newname,
f.func_defaults, f.func_closure)
#
>>> copyfunc(f,newname='f2')
Notice that the ``copy`` module would not do the job:
>>> import copy
>>> copy.copy(f) # error
Traceback (most recent call last):
File "", line 1, in ?
File "/usr/local/lib/python2.3/copy.py", line 84, in copy
y = _reconstruct(x, reductor(), 0)
File "/usr/local/lib/python2.3/copy_reg.py", line 57, in _reduce
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle function objects
THE BEAUTY OF OBJECTS
===========================================================================
In this chapter I will show how to define generic objects in Python, and
how to manipulate them.
User defined objects
--------------------------------------------------------------------------
In Python, one cannot directly modify methods and attributes of built-in
types, since this would be a potentially frightening source of bugs.
Imagine for instance of changing the sort method of a list and invoking an
external module expecting the standard sort: all kind of hideous outcome
could happen.
Nevertheless, in Python, as in all OOP languages, the user can define
her own kind of objects, customized to satisfy her needs. In order to
define a new object, the user must define the class of the objects she
needs. The simplest possible class is a do-nothing class:
::
#
class Object(object):
"A convenient Object class"
#
Elements of the ``Object`` class can be created (instantiated) quite
simply:
>>> from oopp import Object
>>> obj1=Object()
>>> obj1
>>> obj2=Object()
obj2
Notice that the hexadecimal number 0x81580ec is nothing else that the
unique object reference to ``obj1``
>>> hex(id(obj1))
'0x81580ec'
whereas 0x8156704 is the object reference of ``obj2``:
>>> hex(id(obj2))
'0x8156704'
However, at this point ``obj1`` and ``obj2`` are generic
doing nothing objects . Nevertheless, they have
at least an useful attribute, the class docstring:
>>> obj1.__doc__ #obj1 docstring
'A convenient Object class'
>>> obj2.__doc__ # obj2 docstring: it's the same
'A convenient Object class'
Notice that the docstring is associate to the class and therefore all
the instances share the same docstring, unless one explicitly assigns
a different docstring to some instance. ``__doc__``
is a class attribute (or a static attribute for readers familiar with the
C++/Java terminology) and the expression is actually syntactic sugar for
>>> class Object(object): # with explicit assignement to __doc__
... __doc__ = "A convenient Object class"
Since instances of 'Object' can be modified, I can transform them in
anything I want. For instance, I can create a simple clock:
>>> myclock=Object()
>>> myclock
<__main__.Object object at 0x8124614>
A minimal clock should at least print the current time
on the system. This is given by the ``get_time`` function
we defined in the first chapter. We may "attach" that function
to our clock as follows:
>>> import oopp
>>> myclock.get_time=oopp.get_time
>>> myclock.get_time # this is a function, not a method
In other words, we have converted the ``oopp.get_time`` function to a
``get_time`` function of the object ``myclock``. The procedure works
>>> myclock.get_time()
'15:04:57'
but has a disadvantage: if we instantiate another
clock
>>> from oopp import Object
>>> otherclock=Object()
the other clock will ``not`` have a get_time method:
>>> otherclock.get_time() #first attempt; error
AttributeError: 'Object' object has no attribute 'get_time'
Notice instead that the docstring is a *class attribute*, i.e. it
is defined both for the class and *all instances* of the class,
therefore even for ``otherclock``:
>>> Object.__doc__
'A convenient Object class'
>>> otherclock.__doc__
'A convenient Object class'
We would like to convert the ``get_time`` function to a
``get_time`` method for the *entire* class 'Object', i.e. for all its
instances. Naively, one would be tempted to write the following:
>>> Object.get_time=oopp.get_time
However this would not work:
>>> otherclock.get_time() #second attempt; still error
Traceback (most recent call last):
File "", line 1, in ?
TypeError: oopp.get_time() takes no arguments (1 given)
This error message is something that all Python beginners encounter
(and sometimes even non-beginners ;-). The solution is to introduce
an additional argument:
>>> Object.get_time=lambda self : oopp.get_time()
>>> otherclock.get_time # this is method now, not a function
of <__main__.Object object at 0x815881c>>
>>> otherclock.get_time() #third attempt
'15:28:41'
Why this works ? The explanation is the following:
when Python encounters an expression of the form
``objectname.methodname()`` it looks if there is a already a method
*attached* to the object:
a. if yes it invokes it with no arguments
(this is why our first example worked);
b. if not it looks at the class of the object; if there is a method
bound to the class it invokes that method *by passing the
object as first argument*.
When we invoked ``otherclock.get_time()`` in our second attempt, Python
found that the function ``get_time`` was defined at the class level,
and sent it the ``otherclock`` object as first argument: however ``get_time``
was bind to ``func_get_time``, which is function with *no* arguments: whence
the error message. The third attempt worked since, thanks to the
lambda function trick, the ``get_time`` function has been converted to
a function accepting a first argument.
Therefore that's the rule: in Python, one can define methods
at the class level, provided one explitely introduces a first argument
containing the object on which the method is invoked.
This first argument is traditionally called ``self``; the name 'self' is not
enforced, one could use any other valid Python identifier, however the
convention is so widespread that practically everybody uses it;
pychecker will even raise a warning in the case you don't follow the
convention.
I have just shown one the most interesting features of Python, its
*dynamicity*: you can create the class first and add methods to it later.
That logic cannot be followed in typical compiled language as C++. On the
other hand, one can also define methods in a static, more traditional way:
::
#
"Shows how to define methods inside the class (statically)"
import oopp
class Clock(object):
'Clock class; version 0.1'
def get_time(self): # method defined inside the class
return oopp.get_time()
myclock=Clock() #creates a Clock instance
print myclock.get_time() # print the current time
#
In this case we have defined the ``get_time`` method inside the class as a
normal function with an explicit first argument called self; this is
entirely equivalent to the use of a lambda function.
The syntax ``myclock.get_time()`` is actually syntactic sugar for
``Clock.get_time(myclock)``.
In this second form, it is clear the ``get_time`` is really "attached" to the
class, not to the instance.
Objects have static methods and classmethods
-----------------------------------------------------------------------------
.. line-block::
*There should be one--and preferably only one--obvious way to do it*
-- Tim Peters, *The Zen of Python*.
For any rule, there is an exception, and despite the Python's motto
there are many ways to define methods in classes. The way I presented
before was the obvious one before the Python 2.2 revolution; however,
nowadays there is another possibility that, even if less obvious, has the
advantage of some elegance (and it is also slightly more efficient too, even if
efficiency if never a primary concern for a Python programmer).
We see that the first argument in the ``get_time`` method is useless,
since the time is computed from the ``time.asctime()`` function which
does not require any information about the object that is calling
it. This waste is ugly, and since according to the Zen of Python
*Beautiful is better than ugly.*
we should look for another way. The solution is to use a *static method*:
when a static method is invoked, the calling object is *not* implicitly passed
as first argument. Therefore we may use a normal function with no additional
first argument to define the ``get_time`` method:
::
#
class Clock(object):
'Clock with a staticmethod'
get_time=staticmethod(get_time)
#
Here is how it works:
>>> from oopp import Clock
>>> Clock().get_time() # get_time is bound both to instances
'10:34:23'
>>> Clock.get_time() # and to the class
'10:34:26'
The staticmethod idiom converts the lambda function to a
static method of the class 'Clock'. Notice that one can avoid the
lambda expression and use the (arguably more Pythonic) idiom
::
def get_time()
return oopp.get_time()
get_time=staticmethod(oopp.get_time)
as the documentation suggests:
>>> print staticmethod.__doc__
staticmethod(function) -> method
Convert a function to be a static method.
A static method does not receive an implicit first argument.
To declare a static method, use this idiom:
class C:
def f(arg1, arg2, ...): ...
f = staticmethod(f)
It can be called either on the class (e.g. C.f()) or on an instance
(e.g. C().f()). The instance is ignored except for its class.
Static methods in Python are similar to those found in Java or C++.
For a more advanced concept, see the classmethod builtin.
At the present the notation for static methods is still rather ugly,
but it is expected to improve in future versions of Python (probably
in Python 2.4). Documentation for static methods can
be found in Guido's essay and in the PEP.. : however this is intended for
developers.
As the docstring says, static methods are also "attached" to the
class and may be called with the syntax ``Clock.get_time()``.
A similar remark applies for the so called *classmethods*:
>>> print classmethod.__doc__
classmethod(function) -> method
Convert a function to be a class method.
A class method receives the class as implicit first argument,
just like an instance method receives the instance.
To declare a class method, use this idiom:
class C:
def f(cls, arg1, arg2, ...): ...
f = classmethod(f)
It can be called either on the class (e.g. C.f()) or on an instance
(e.g. C().f()). The instance is ignored except for its class.
If a class method is called for a derived class, the derived class
object is passed as the implied first argument.
Class methods are different than C++ or Java static methods.
If you want those, see the staticmethod builtin.
#When a regular method is invoked, a reference to the calling object is
#implicitely passed as first argument; instead, when a static method is
#invoked, no reference to the calling object is passed.
As the docstring says, classmethods are convenient when one wants to pass
to a method the calling *class*, not the calling object. Here there is an
example:
>>> class Clock(object): pass
>>> Clock.name=classmethod(lambda cls: cls.__name__)
>>> Clock.name() # called by the class
'Clock'
>>> Clock().name() # called by an instance
'Clock'
Notice that classmethods (and staticmethods too)
can only be attached to classes, not to objects:
>>> class Clock(object): pass
>>> c=Clock()
>>> c.name=classmethod(lambda cls: cls.__name__)
>>> c.name() #error
Traceback (most recent call last):
File "", line 1, in ?
TypeError: 'classmethod' object is not callable
gives a TypeError. The reason is that classmethods and staticmethods
are implemented
trough *attribute descriptors*. This concept will be discussed in detail in a
forthcoming in chapter 6.
Notice that classmethods are not proving any fundamental feature, since
one could very well use a normal method and retrieve the class with
``self.__class__`` as we did in the first chapter.
Therefore, we could live without (actually, I think they are a non-essential
complication to the language).
Nevertheless, now that we have them, we can use them, since
they come handy in various circumstances, as we will see in the following.
Objects have their privacy
---------------------------------------------------------------------------
In some situations, it is convenient to give to the developer
some information that should be hided to the final user. To this
aim Python uses private names (i.e. names starting with a single
underscore) and private/protected attributes (i.e. attributes starting with
a double underscore).
Consider for instance the following script:
::
#
import time
class Clock(object):
__secret="This Clock is quite stupid."
myclock=Clock()
try: print myclock.__secret
except Exception,e: print "AttributeError:",e
#
The output of this script is
::
AttributeError: 'Clock' object has no attribute '__secret'
Therefore, even if the Clock object *does* have a ``__secret`` attribute,
the user cannot access it ! In this way she cannot discover that
actually "This Clock is quite stupid."
In other programming languages, attributes like ``__secret`` are
called "private" attributes. However, in Python private attributes
are not really private and their secrets can be accessed with very
little effort.
First of all, we may notice that ``myclock`` really contains a secret
by using the builtin function ``dir()``:
::
dir(myclock)
['_Clock__secret', '__class__', '__delattr__', '__dict__', '__doc__',
'__getattribute__', '__hash__', '__init__', '__module__', '__new__',
'__reduce__', '__repr__', '__setattr__', '__str__', '__weakref__']
We see that the first attribute of myclock is '_Clock__secret``,
which we may access directly:
::
print myclock._Clock__secret
This clock is quite stupid.
We see here the secret of private variables in Python: the *name mangling*.
When Python sees a name starting with two underscores (and not ending
with two underscores, otherwise it would be interpreted as a special
attribute), internally it manage it as ``_Classname__privatename``.
Notice that if 'Classname' begins with underscores, the leading underscores
are stripped in such a way to guarantee that the private name starts with
only *one* underscore. For instance, the '__secret' private attribute
of classes such as 'Clock', '_Clock', '__Clock', '___Clock', etc. is
mangled to '_Clock__secret'.
Private names in Python are *not* intended to keep secrets: they
have other uses.
1. On one hand, private names are a suggestion to the developer.
When the Python programmer sees a name starting with one or two
underscores in a program written by others, she understands
that name should not be of concern for the final user, but it
only concerns the internal implementation.
2. On the other hand, private names are quite useful in class
inheritance, since they provides safety with respect to the overriding
operation. This point we will discussed in the next chapter.
3. Names starting with one (or more) underscores are not imported by the
statement ``from module import *``
Remark: it makes no sense to define names with double underscores
outside classes, since the name mangling doesn't work in this case.
Let me show an example:
>>> class Clock(object): __secret="This Clock is quite stupid"
>>> def tellsecret(self): return self.__secret
>>> Clock.tellsecret=tellsecret
>>> Clock().tellsecret() #error
Traceback (most recent call last):
File "", line 1, in ?
File "", line 2, in tellsecret
AttributeError: 'Clock' object has no attribute '__secret'
The explanation is that since ``tellsecret()`` is defined outside the class,
``__secret`` is not expanded to ``_Clock__secret`` and therefore cannot be
retrieved, whereas
>>> class Clock(object):
... __secret="This Clock is quite stupid"
... def tellsecret(self): return self.__secret
>>> Clock().tellsecret()
This Clock is quite stupid
will work. In other words, private variables are attached to classes,
not objects.
Objects have properties
-------------------------------------------------------------------------------
In the previous section we have shown that private variables are of
little use for keeping secrets: if a developer really wants to restrict
the access to some methods or attributes, she has to resort to
*properties*.
Let me show an example:
::
#
import oopp
class Clock(object):
'Clock class with a secret'
you_know_the_pw=False #default
def give_pw(self,pw):
"""Check if your know the password. For security, one should crypt
the password."""
self.you_know_the_pw=(pw=="xyz")
def get_secret(self):
if self.you_know_the_pw:
return "This clock doesn't work."
else:
return "You must give the right password to access 'secret'"
secret=property(get_secret)
c=Clock()
print c.secret # => You must give the right password to access 'secret'
c.give_pw('xyz') # gives the right password
print c.secret # => This clock doesn't work.
print Clock.secret # =>
#
In this script, one wants to restrict the access to the attribute
'secret', which can be accessed only is the user provide the
correct password. Obviously, this example is not very secure,
since I have hard coded the password 'xyz' in the source code,
which is easily accessible. In reality, one should crypt the
password a perform a more sophisticated test than the trivial
check ``(pw=="xyz")``; anyway, the example is only intended to
shown the uses of properties, not to be really secure.
The key action is performed by the descriptor class ``property`` that
converts the function ``get_secret`` in a property object. Additional
informations on the usage of ``property`` can be obtained from the
docstring:
>>> print property.__doc__
property(fget=None, fset=None, fdel=None, doc=None) -> property attribute
fget is a function to be used for getting an attribute value, and likewise
fset is a function for setting, and fdel a function for del'ing, an
attribute. Typical use is to define a managed attribute x:
class C(object):
def getx(self): return self.__x
def setx(self, value): self.__x = value
def delx(self): del self.__x
x = property(getx, setx, delx, "I'm the 'x' property.")
Properties are another example of attribute descriptors.
Objects have special methods
---------------------------------------------------------------------------
From the beginning, we stressed that objects have special attributes that
may turn handy, as for instance the docstring ``__doc__`` and the class
name attribute ``__class__``. They have special methods, too.
With little doubt, the most useful special method is the ``__init__``
method, that *initializes* an object right after its creation. ``__init__``
is typically used to pass parameters to *object factories*. Let me an
example with geometric figures:
::
#
class GeometricFigure(object): #an example of object factory
"""This class allows to define geometric figures according to their
equation in the cartesian plane. It will be extended later."""
def __init__(self,equation,**parameters):
"Specify the cartesian equation of the object and its parameters"
self.eq=equation
self.par=parameters
for k,v in self.par.items(): #replaces the parameters in the equation
self.eq=self.eq.replace(k,str(v))
self.contains=eval('lambda x,y : '+self.eq)
# dynamically creates the function 'contains'
#
Here it is how it works:
>>> from oopp import *
>>> disk=GeometricFigure('(x-x0)**2+(y-y0)**2 <= r**2', x0=0,y0=0,r=5)
>>> # creates a disk of radius 5 centered in the origing
>>> disk.contains(1,2) #asks if the point (1,2) is inside the disk
True
>>> disk.contains(4,4) #asks if the point (4,4) is inside the disk
False
Let me continue the section on special methods with some some observations on
``__repr__`` and ``__str__``.Notice that I
will not discuss all the subtleties; for a thought discussion, see the
thread "Using __repr__ or __str__" in c.l.p. (Google is your friend).
The following discussion applies to new style classes, old style classes
are subtly different; moreover.
When one writes
>>> disk
one obtains the *string representation* of the object. Actually, the previous
line is syntactic sugar for
>>> print repr(disk)
or
>>> print disk.__repr__()
The ``repr`` function extracts the string representation from the
the special method ``__repr__``, which can be redefined in order to
have objects pretty printed. Notice that ``repr`` is conceptually
different from the ``str`` function that controls the output of the ``print``
statement. Actually, ``print o`` is syntactic sugar for ``print str(o)``
which is sugar for ``print o.__str__()``.
If for instance we define
::
#
class PrettyPrinted(object):
formatstring='%s' # default
def __str__(self):
"""Returns the name of self in quotes, possibly formatted via
self.formatstring. If self has no name, returns the name
of its class in angular brackets."""
try: #look if the selfect has a name
name="'%s'" % self.__name__
except AttributeError: #if not, use the name of its class
name='<%s>' % type(self).__name__
if hasattr(self,'formatstring'):
return self.formatstring % name
else:
return name
#
then we have
>>> from oopp import PrettyPrinted
>>> o=PrettyPrinted() # o is an instance of PrettyPrinted
>>> print o #invokes o.__str__() which in this case returns o.__class__.name
whereas
>>> o # i.e. print repr(o)
However, in most cases ``__repr__`` and ``__str__`` gives the same
output, since if ``__str__`` is not explicitely defined it defaults
to ``__repr__``. Therefore, whereas modifying ``__str__``
does not change ``__repr__``, modifying ``__repr__`` changes ``__str__``,
if ``__str__`` is not explicitely given:
::
#
"__repr__ can also be a regular method, not a classmethod"
class Frog(object):
attributes="poor, small, ugly"
def __str__(self):
return "I am a "+self.attributes+' '+self.__class__.__name__
class Prince(object):
attributes='rich, tall, beautiful'
def __str__(self):
return "I am a "+self.attributes+' '+self.__class__.__name__
jack=Frog(); print repr(jack),jack
charles=Prince(); print repr(charles),charles
#
The output of this script is:
::
I am a poor, small, ugly Frog
I am a rich, tall, beautiful Prince
for jack and charles respectively.
``__str__`` and ``__repr__`` are also called by the formatting
operators "%s" and "%r".
Notice that i) ``__str__`` can be most naturally
rewritten as a class method; ii) Python is magic:
::
#
"""Shows two things:
1) redefining __repr__ automatically changes the output of __str__
2) the class of an object can be dinamically changed! """
class Frog(object):
attributes="poor, small, ugly"
def __repr__(cls):
return "I am a "+cls.attributes+' '+cls.__name__
__repr__=classmethod(__repr__)
class Prince(object):
attributes='rich, tall, beautiful'
def __repr__(cls):
return "I am a "+cls.attributes+' '+cls.__name__
__repr__=classmethod(__repr__)
def princess_kiss(frog):
frog.__class__=Prince
jack=Frog()
princess_kiss(jack)
print jack # the same as repr(jack)
#
Now the output for jack is "I am a rich, tall, beautiful Prince" !
In Python you may dynamically change the class of an object!!
Of course, this is a feature to use with care ;-)
There are many others special methods, such as __new__, __getattr__,
__setattr__, etc. They will be discussed in the next chapters, in
conjunction with inheritance.
Objects can be called, added, subtracted, ...
---------------------------------------------------------------------------
Python provides a nice generalization of functions, via the concept
of *callable objects*. A callable object is an object with a ``__call__``
special method. They can be used to define "functions" that remember
how many times they are invoked:
::
#
class MultiplyBy(object):
def __init__(self,n):
self.n=n
self.counter=0
def __call__(self,x):
self.counter+=1
return self.n*x
double=MultiplyBy(2)
res=double(double(3)) # res=12
print "double is callable: %s" % callable(double)
print "You have called double %s times." % double.counter
#
With output
::
double is callable: True
You have called double 2 times.
The script also show that callable objects (including functions)
can be recognized with the ``callable`` built-in function.
Callable object solves elegantly the problem of having "static" variables
inside functions (cfr. with the 'double' example in chapter 2).
A class with a ``__call__`` method can be used to generate an entire
set of customized "functions". For this reason, callable objects are
especially useful in the conjunction with object factories. Let me show
an application to my factory of geometric figures:
::
#
class Makeobj(object):
"""A factory of object factories. Makeobj(cls) returns instances
of cls"""
def __init__(self,cls,*args):
self.cls=cls
self.args=args
def __call__(self,**pars):
return self.cls(*self.args,**pars)
#
#
from oopp import Makeobj,GeometricFigure
makedisk=Makeobj(GeometricFigure,'(x-x0)**2+(y-y0)**2 False
print square.contains(9,9) # => True
#etc.
#
This factory generates callable objects, such as ``makedisk`` and
``makesquare`` that returns geometric objects. It gives a nicer interface
to the object factory provided by 'GeometricFigure'.
Notice that the use of the expression ``disk.contains(9,9)`` in order to
know if the point of coordinates (9,9) is contained in the disk, it is
rather inelegant: it would be much better to be able to ask if
``(9,9) in disk``. This is possibile, indeed: and the secrets is to
define the special method ``__contains__``. This is done in the next
example, that I think give a good taste of the beauty of objects
::
#
from oopp import Makeobj
Nrow=50; Ncol=78
class GeometricFigure(object):
"""This class allows to define geometric figures according to their
equation in the cartesian plane. Moreover addition and subtraction
of geometric figures are defined as union and subtraction of sets."""
def __init__(self,equation,**parameters):
"Initialize "
self.eq=equation
self.par=parameters
for (k,v) in self.par.items(): #replaces the parameters
self.eq=self.eq.replace(k,str(v))
self.contains=eval('lambda x,y : '+self.eq)
def combine(self,fig,operator):
"""Combine self with the geometric figure fig, using the
operators "or" (addition) and "and not" (subtraction)"""
comboeq="("+self.eq+")"+operator+"("+fig.eq+")"
return GeometricFigure(comboeq)
def __add__(self,fig):
"Union of sets"
return self.combine(fig,' or ')
def __sub__(self,fig):
"Subtraction of sets"
return self.combine(fig,' and not')
def __contains__(self,point): #point is a tuple (x,y)
return self.contains(*point)
makedisk=Makeobj(GeometricFigure,'(x-x0)**2/4+(y-y0)**2 <= r**2')
upperdisk=makedisk(x0=38,y0=7,r=5)
smalldisk=makedisk(x0=38,y0=30,r=5)
bigdisk=makedisk(x0=38,y0=30,r=14)
def format(text,shape):
"Format the text in the shape given by figure"
text=text.replace('\n',' ')
out=[]; i=0; col=0; row=0; L=len(text)
while 1:
if (col,row) in shape:
out.append(text[i]); i+=1
if i==L: break
else:
out.append(" ")
if col==Ncol-1:
col=0; out.append('\n') # starts new row
if row==Nrow-1: row=0 # starts new page
else: row+=1
else: col+=1
return ''.join(out)
composition=bigdisk-smalldisk+upperdisk
print format(text='Python Rules!'*95,shape=composition)
#
I leave as an exercise for the reader to understand how does it work and to
play with other geometric figures (he can also generate them trough the
'Makeobj' factory). I think it is nicer to show its output:
::
Pyt
hon Rules!Pyt
hon Rules!Python
Rules!Python Rules!
Python Rules!Python
Rules!Python Rules!P
ython Rules!Python
Rules!Python Rules!
Python Rules!Pyth
on Rules!Pyth
on
Rul
es!Python Rules!Pytho
n Rules!Python Rules!Python R
ules!Python Rules!Python Rules!Pyth
on Rules!Python Rules!Python Rules!Pyth
on Rules!Python Rules!Python Rules!Python R
ules!Python Rules!Python Rules!Python Rules!Pyt
hon Rules!Python Rules!Python Rules!Python Rules!
Python Rules!Python Rules!Python Rules!Python Rules
!Python Rules!Python Rule s!Python Rules!Python Rul
es!Python Rules!Pyth on Rules!Python Rule
s!Python Rules!Pyth on Rules!Python Rul
es!Python Rules!Py thon Rules!Python
Rules!Python Rules !Python Rules!Pyth
on Rules!Python Ru les!Python Rules!P
ython Rules!Python Rules!Python Rule
s!Python Rules!Pyt hon Rules!Python R
ules!Python Rules!P ython Rules!Python
Rules!Python Rules!P ython Rules!Python R
ules!Python Rules!Python Rules!Python Rules!Python
Rules!Python Rules!Python Rules!Python Rules!Pytho
n Rules!Python Rules!Python Rules!Python Rules!Py
thon Rules!Python Rules!Python Rules!Python Rul
es!Python Rules!Python Rules!Python Rules!P
ython Rules!Python Rules!Python Rules!P
ython Rules!Python Rules!Python Rul
es!Python Rules!Python Rules!
Python Rules!Python R
ule
s!
Remark.
Unfortunately, "funnyformatter.py" does not reuse old code: in spite of the
fact that we already had in our library the 'GeometricFigure' class, with
an "__init__" method that is exactly the same of the "__init__" method in
"funnyformatter.py", we did not reuse that code. We simply did a cut
and paste. This means that if we later find a bug in the ``__init__`` method,
we will have to fix it twice, both in the script and in the library. Also,
if we plan to extend the method later, we will have to extend it twice.
Fortunately, this nasty situation can be avoided: but this requires the
power of inheritance.
THE POWER OF CLASSES
==========================================================================
This chapter is devoted to the concept of class inheritance. I will discuss
single inheritance, cooperative methods, multiple inheritance and more.
The concept of inheritance
----------------------------------------------------------------------
Inheritance is perhaps the most important basic feature in OOP, since it
allows the reuse and incremental improvement of old code.
To show this point, let me come back to one of the
examples I have introduced in the last chapter, 'fairytale1.py' script,
where I defined the classes 'Frog' and 'Prince' as
::
class Frog(object):
attributes="poor, small, ugly"
def __str__(self):
return "I am a "+self.attributes+' '+self.__class__.__name__
class Prince(object):
attributes='rich, tall, beautiful'
def __str__(self):
return "I am a "+self.attributes+' '+self.__class__.__name__
We see that the way we followed here was very bad since:
1. The ``__str__`` method is duplicated both in Frog and in Prince: that
means that if we find a bug a later, we have to fix it twice!
2. The ``__str__`` was already defined in the PrettyPrinted class (actually
more elegantly), therefore we have triplicated the work and worsened the
situation!
This is very much against the all philosophy of OOP:
*never cut and paste!*
We should *reuse* old code, not paste it!
The solution is *class inheritance*. The idea behind inheritance is to
define new classes as subclasses of a *parent* classes, in such a way that
the *children* classes possess all the features of the parents.
That means that we do not need to
redefine the properties of the parents explicitely.
In this example, we may derive both 'Frog' and 'Prince' from
the 'PrettyPrinted' class, thus providing to both 'Frog' and 'Prince'
the ``PrettyPrinted.__str__`` method with no effort:
>>> from oopp import PrettyPrinted
>>> class Frog(PrettyPrinted): attributes="poor, small, ugly"
...
>>> class Prince(PrettyPrinted): attributes="rich, tall, beautiful"
...
>>> print repr(Frog()), Frog()
<__main__.Frog object at 0x401cbeac>
>>> print Prince()
>>> print repr(Prince()),Prince()
<__main__.Prince object at 0x401cbaac>
Let me show explicitly that both 'Frog' and 'Prince' share the
'PrettyPrinted.__str__' method:
>>> id(Frog.__str__) # of course, YMMV
1074329476
>>> id(Prince.__str__)
1074329476
>>> id(PrettyPrinted.__str__)
1074329476
The method is always the same, since the object reference is the same
(the precise value of the reference is not guaranteed to be 1074329476,
however!).
This example is good to show the first advantage of inheritance:
*avoiding duplication of code*.
Another advantage of inheritance, is *extensibility*: one can very easily
improve existing code. For instance, having written the ``Clock`` class once,
I can reuse it in many different ways. for example I can build a ``Timer``
to be used for benchmarks. It is enough to reuse the function ``with_timer``
introduced in the first chapter (functions are good for reuse of code, too ;):
::
#
class Timer(Clock):
"Inherits the get_time staticmethod from Clock"
execute=staticmethod(with_timer)
loop_overhead=staticmethod(loop_overhead)
#
Here there is an example of application:
>>> from oopp import Timer
>>> Timer.get_time()
'16:07:06'
Therefore 'Timer' inherits 'Clock.get_time'; moreover it has the additional
method ``execute``:
>>> def square(x): return x*x
...
>>> Timer.execute(square,n=100000)(1)
executing square ...
Real time: 0.01 ms CPU time: 0.008 ms
The advantage of putting the function ``execute`` in a class is that
now we may *inherit* from that class and improve out timer *ad
libitum*.
Inheritance versus run-time class modifications
-------------------------------------------------------------------------
Naively, one could think of substituting inheritance with run-time
modification of classes, since this is allowed by Python. However,
this is not such a good idea, in general. Let me give a simple example.
Suppose we want to improve our previous clock, to show the date, too.
We could reach that goal with the following script:
::
#
"Shows how to modify and enhances classes on the fly"
from oopp import *
clock=Clock() #creates a Clock instance
print clock.get_time() # print the current time
get_data=lambda : ' '.join(time.asctime().split()[0:3])+ \
' '+time.asctime().split()[-1]
get_data_and_time=lambda : "Today is: %s \nThe time is: %s" % (
get_data(),get_time()) # enhances get_time
Clock.get_time=staticmethod(get_data_and_time)
print clock.get_time() # print the current time and data
#
The output of this script is:
12:51:25
Today is: Sat Feb 22 2003
The time is: 12:51:25
Notice that:
1. I instantiated the ``clock`` object *before* redefining the ``get_time``
method, when it only could print the time and *not* the date.
2. However, after the redefinition of the class, the behaviour of all its
instances is changed, *including the behaviour of objects instantiated
before the change!*. Then ``clock`` *can* print the date, too.
This is not so surprising, once you recognize that Guido own a very famous
time-machine ... ;-)
Seriously, the reason is that an object does not contains a reserved copy
of the attributes and methods of its class: it only contains *references*
to them. If we change them in the class, the references to them in the
object will stay the same, but the contents will change.
In this example, I have solved the problem of enhancing the 'Clock' class
without inheritance, but dynamically replaceing its ``get_time``
(static) method with the `get_data_and_time`` (static) method.
The dynamics modification of methods can be cool, but it should be avoided
whenever possible, at least for two reasons [#]_:
1. having a class and therefore all its instances (including the instances
created before the modification !) changed during the life-time of the
program can be very confusing to the programmer, if not to the interpreter.
2. the modification is destructive: I cannot have the old ``get_time`` method
and the new one at the same time, unless one explicitly gives to it
a new name (and giving new names increases the pollution of the namespace).
Both these disadvantages can be solved by resorting to the mechanism of
inheritance. For instance, in this example, we can derive a new class
``NewClock`` from ``Clock`` as follows:
::
#
import oopp,time
get_data=lambda : ' '.join(time.asctime().split()[0:3])+ \
' '+time.asctime().split()[-1]
get_data_and_time=lambda : "Today is: %s \nThe time is: %s" % (
get_data(),oopp.get_time()) # enhances get_time
class NewClock(oopp.Clock):
"""NewClock is a class that inherits from Clock, provides get_data
and overrides get_time."""
get_data=staticmethod(get_data)
get_time=staticmethod(get_data_and_time)
clock=oopp.Clock(); print 'clock output=',clock.get_time()
newclock=NewClock(); print 'newclock output=',newclock.get_time()
#
The output of this script is:
::
clock output= 16:29:17
newclock output= Today is: Sat Feb 22 2003
The time is: 16:29:17
We see that the two problems previously discussed are solved since:
i) there is no cut and paste: the old method ``Clock.get_time()`` is used
in the definition of the new method ``NewClock.get_time()``;
ii) the old method is still accessible as ``Clock.get_time()``; there is
no need to invent a new name like ``get_time_old()``.
We say that the method ``get_time`` in ``NewClock`` *overrides* the method
``get_time`` in Clock.
This simple example shows the power of inheritance in code
reuse, but there is more than that.
Inheritance is everywhere in Python, since
all classes inherit from object. This means that all classes
inherit the methods and attributes of the object class, such as ``__doc__``,
``__class__``, ``__str__``, etc.
.. [#] There are cases when run-time modifications of classes is useful
anyway: particularly when one wants to modify the behavior of
classes written by others without changing the source code. I
will show an example in next chapter.
Inheriting from built-in types
-----------------------------------------------------------------------
However, one can subclass a built-in type, effectively creating an
user-defined type with all the feature of a built-in type, and modify it.
Suppose for instance one has a keyword dictionary such as
>>> kd={'title': "OOPP", 'author': "M.S.", 'year': 2003}
it would be nice to be able to access the attributes without
excessive quoting, i.e. using ``kd.author`` instead of ``kd["author"]``.
This can be done by subclassing the built-in class ``dict`` and
by overriding the ``__getattr__`` and ``__setattr__`` special methods:
::
#
class kwdict(dict):
"Keyword dictionary base class"
def __getattr__(self,attr):
return self[attr]
def __setattr__(self,key,val):
self[key]=val
__str__ = pretty
#
Here there is an example of usage:
>>> from oopp import kwdict
>>> book=kwdict({'title': "OOPP", 'author': "M.S."})
>>> book.author #it works
'M.S.'
>>> book["author"] # this also works
'M.S.'
>>> book.year=2003 #you may also add new fields on the fly
>>> print book
author = M.S.
title = OOPP
year = 2003
The advantage of subclassing the built-in 'dict', it that you have for free
all the standard dictionary methods, without having to reimplement them.
However, to subclass built-in it is not always a piece of cake. In
many cases there are complications, indeed. Suppose for instance
one wants to create an enhanced string type, with
the ability of indent and dedent a block of text provided by
the following functions:
::
#
def indent(block,n):
"Indent a block of code by n spaces"
return '\n'.join([' '*n+line for line in block.splitlines()])
def dedent(block):
"Dedent a block of code, if need there is"""
lines=block.splitlines()
for line in lines:
strippedline=line.lstrip()
if strippedline: break
spaces=len(line)-len(strippedline)
if not spaces: return block
return '\n'.join([line[spaces:] for line in lines])
#
The solution is to inherit from the built-in string type ``str``, and to
add to the new class the ``indent`` and ``dedent`` methods:
>>> from oopp import indent,dedent
>>> class Str(str):
... indent=indent
... dedent=dedent
>>> s=Str('spam\neggs')
>>> type(s)
>>> print s.indent(4)
spam
eggs
However, this approach has a disadvantage, since the output of ``indent`` is
not a ``Str``, but a normal ``str``, therefore without the additional
``indent`` and ``dedent`` methods:
>>> type(s.indent(4))
>>> s.indent(4).indent(4) #error
Traceback (most recent call last):
File "", line 9, in ?
AttributeError: 'str' object has no attribute 'indent'
>>> s.indent(4).dedent(4) #error
Traceback (most recent call last):
File "", line 9, in ?
AttributeError: 'str' object has no attribute 'dedent'
We would like ``indent`` to return a ``Str`` object. To solve this problem
it is enough to rewrite the class as follows:
::
#
from oopp import indent,dedent
class Str(str):
def indent(self,n):
return Str(indent(self,n))
def dedent(self):
return Str(dedent(self))
s=Str('spam\neggs').indent(4)
print type(s)
print s # indented s
s=s.dedent()
print type(s)
print s # non-indented s
#
Now, everything works and the output of the previous script is
::
spam
eggs
spam
eggs
The solution works because now ``indent()`` returns an instance
of ``Str``, which therefore has an ``indent`` method. Unfortunately,
this is not the end. Suppose we want to add another food to our list:
>>> s2=s+Str("\nham")
>>> s2.indent(4) #error
Traceback (most recent call last):
File "", line 1, in ?
AttributeError: 'str' object has no attribute 'indent'
The problem is the same, again: the type of ``s2`` is ``str``
>>> type(s2)
and therefore there is no ``indent`` method available. There is a
solution to this problem, i.e. to redefine the addition operator
for objects of the class ``Str``. This can be done directly by hand,
but it is *ugly* for the following reasons:
1. If you derive a new class from ``Str`` you have to redefine the
addition operator (both the left addition and the right addition [#]_)
again (ughh!);
2. There are others operators you must redefine, in particular the
the augumented assignement operator ``+=``, the repetition operator ``*``
and its augmented version ``*=``;
3. In the case of numeric types, one must redefine, ``+,-,*,/,//, mod,``,
possibily ``<<,>>`` and others, including the corresponding
augumented assignement operators and the left and the right form of
the operators.
This is a mess, especially since due to point 1, one has to redefined
all the operators each time she defines a new subclass. I short, one has
to write a lot of boilerplate for a stupid job that the language
should be able to perform itself automatically. But here are the
good news: Python *can* do all that automatically, in an elegant
and beautiful way, which works for all types, too.
But this requires the magic of metaclasses.
.. [#] The right addition works this way. Python looks at the expression x+y
and if x has an explicit__add__ method invokes it; on the other hand,
if x does not define an __add__ method, Python considers y+x.
If y defines a __radd__ method, it invokes it, otherwise
raises an exception. The same is done for right multiplication, etc.
Controlling the creation of objects
---------------------------------------------------------------------------
Before introducing multiple inheritance, let me make a short digression on
the mechanism of object creation in Python 2.2+. The important point is
that new style classes have a ``__new__`` static method that allows
the user to take complete control of object creation. To understand how
``__new__`` works, I must explain what happens when an object is instantiated
with a statement like
::
s=Str("spam") #object creation
What happens under the hood, is that the special static method ``__new__``
of the class ``Str`` (inherited from the built-in ``str`` class)
is invoked ``before`` the ``Str.__init__`` method. This means that
the previous line should really be considered syntactic sugar for:
::
s=Str.__new__(Str,"spam") # Str.__new__ is actually str.__new__
assert isinstance(s,Str)
Str.__init__(s,"spam") # Str.__init__ is actually str.__init__
Put it more verbosely, what happens during the object creation is the
following:
1. the static method ``__new__`` is invoked with the class of the created
object as first argument [#]_;
2. ``__new__`` returns an instance of that class.
3. the instance is then initialized by the ``__init__`` method.
Notice that both ``__new__`` and ``__init__`` are called with the same
argument list, therefore one must make sure that they have a compatible
signature.
Let me discuss now why ``__new__`` must be a static method.
First of all, it cannot be a normal method with a first argument which is an
instance of the calling class, since at the time of ``__new__`` invocation
that instance (``myclock`` in the example) has still to be created
Since ``__new__`` needs information about the class calling it, one
could think of implementing ``__new__`` as a class method. However,
this would implicitly pass the caller class and return an instance
of it. It is more convenient, to have the ability of creating
instances of any class directly from C.__new__(B,*args,**kw)
For this reasons, ``__new__`` must be a static method and pass explicitly
the class which is calling it.
Let me now show an important application of the ``__new__`` static method:
forbidding object creation. For instance, sometimes it is useful to have
classes that cannot be instantiated. This kind of classes can be
obtained by inheriting from a ``NonInstantiable`` class:
::
#
class NonInstantiableError(Exception):
pass
class NonInstantiable(object):
def __new__(cls,*args,**kw):
raise NonInstantiableError("%s cannot be instantiated" % cls)
#
Here there is an example of usage:
>>> from oopp import NonInstantiable,get_time
>>> class Clock(NonInstantiable):
... get_time=staticmethod(get_time)
>>> Clock.get_time() # works
'18:48:08'
Clock() #error
Traceback (most recent call last):
File "", line 1, in ?
Clock()
File "oopp.py", line 257, in __new__
raise NonInstantiableError("%s cannot be instantiated" % cls)
NonInstantiableError: cannot be instantiated
However, the approach pursued here has a disadvantage:``Clock`` was already
defined as a subclass of ``object`` and I has to change the source code
to make it a subclass of 'NonInstantiable'. But what happens if
I cannot change the sources? How can I *reuse* the old code?
The solution is provided by multiple inheritance.
Notice that '__new__' is a staticmethod: [#]_
>>> type(NonInstantiable.__dict__['__new__'])
.. [#] This is how ``type(s)`` or ``s.__class__`` get to know that
``s`` is an instance of ``Str``, since the class information is
explicitely passed to the newborn object trough ``__new__``.
.. [#] However ``object.__dict__['__new__']`` is not a staticmethod
>>> type(object.__dict__['__new__']) # special case
Multiple Inheritance
----------------------------------------------------------------------------
Multiple Inheritance (often abbreviated as MI) is often
considered one of the most advanced topic in Object Oriented Programming.
It is also one of the most difficult features to implement
in an Object Oriented Programming language. Even, some languages by design
decided to avoid it. This is for instance the case of Java, that avoided
MI having seen its implementation in C++ (which is not for the faint of
heart ;-) and uses a poorest form of it trough interfaces.
For what concerns the scripting languages, of which
the most famous are Perl, Python and Ruby (in this order, even if
the right order would be Python, Ruby and Perl), only Python
implements Multiple Inheritance well (Ruby has a restricted form
of it trough mix-ins, whereas Perl implementation is too difficult
for me to understand what it does ;).
The fact that Multiple Inheritance can be hairy, does not mean that it
is *always* hairy, however. Multiple Inheritance is used with success
in Lisp derived languages (including Dylan).
The aims of this chapter is to discuss the
Python support for MI in the most recent version (2.2 and 2.3), which
has considerably improved with respect to previous versions.
The message is the following: if Python 1.5 had a basic support for
MI inheritance (basic but nevertheless with nice features, dynamic),
Python 2.2 has *greatly* improved that support and with the
change of the Method Resolution Order in Python 2.3, we may say
that support for MI is now *excellent*.
I strongly encourage Python programmers to use MI a lot: this will
allows even a stronger reuse of code than in single inheritance.
Often, inheritance is used when one has a complicate class B, and she wants
to modify (or enhance) its behavior, by deriving a child class C, which is
only slightly different from B. In this situation, B is already a standalone
class, providing some non-trivial functionality, independently from
the existence of C. This kind of design it typical of the so called
*top-down* philosophy, where one builds the
all structure as a monolithic block, leaving room only for minor improvements.
An alternative approach is the so called *bottom-up* programming, in
which one builds complicate things starting from very simple building blocks.
In this logic, it is very appealing the idea of creating classes with the
only purpose of being derived. The 'NonInstantiable' just defined is a
perfect example of this kind of classes, though with multiple inheritance
in mind and often called *mixin* classes.
It can be used to create a new class ``NonInstantiableClock``
that inherits from ``Clock`` and from ``NonInstantiable``.
::
#
class NonInstantiableClock(Clock,NonInstantiable):
pass
#
Now ``NonInstantiableClock`` is both a clock
>>> from oopp import NonInstantiableClock
>>> NonInstantiableClock.get_time() # works
'12:57:00'
and a non-instantiable class:
>>> NonInstantiableClock() # as expected, give an error
Traceback (most recent call last):
File "", line 1, in ?
NonInstantiableClock() # error
File "oopp.py", line 245, in __new__
raise NonInstantiableError("%s cannot be instantiated" % cls)
NonInstantiableError:
cannot be instantiated
Let me give a simple example of a situation where the mixin approach
comes handy. Suppose that the owner of a 'Pizza-shop' needs a program to
take care of all the pizzas to-go he sell. Pizzas are distinguished
according to their size (small, medium or large) and their toppings.
The problem can be solved by inheriting from a generic pizza factory
like this:
::
#
class GenericPizza(object): # to be customized
toppinglist=[] # nothing, default
baseprice=1 # one dollar, default
topping_unit_price=0.5 # half dollar for each topping, default
sizefactor={'small':1, 'medium':2, 'large':3}
# a medium size pizza costs twice a small pizza,
# a large pizza costs three times
def __init__(self,size):
self.size=size
def price(self):
return (self.baseprice+
self.toppings_price())*self.sizefactor[self.size]
def toppings_price(self):
return len(self.toppinglist)*self.topping_unit_price
def __str__(self):
return '%s pizza with %s, cost $ %s' % (self.size,
','.join(self.toppinglist),
self.price())
#
Here the base class 'GenericPizza' is written with inheritance in mind: one
can derives many pizza classes from it by overriding the ``toppinglist``;
for instance one could define
>>> from oopp import GenericPizza
>>> class Margherita(GenericPizza):
... toppinglist=['tomato']
The problem of this approach is that one must define dozens of
different pizza subclasses (Marinara, Margherita, Capricciosa, QuattroStagioni,
Prosciutto, ProsciuttoFunghi, PizzaDellaCasa, etc. etc. [#]_). In such a
situation, it is better to perform the generation of subclasses in a smarter
way, i.e. via a customizable class factory.
A simpler approach is to use always the same class and to customize
its instances just after creation. Both approaches can be implemented via
the following 'Customizable' mixin class, not meant to be instantiated,
but rather to be *inherited*:
::
#
class Customizable(object):
"""Classes inhering from 'Customizable' have a 'with' method acting as
an object modifier and 'With' classmethod acting as a class factory"""
def with(self,**kw):
customize(self,**kw)# customize the instance
return self # returns the customized instance
def With(cls,**kw):
class ChildOf(cls): pass # a new class inheriting from cls
ChildOf.__name__=cls.__name__ # by default, with the same name
customize(ChildOf,**kw) # of the original class
return ChildOf
With=classmethod(With)
#
Descendants of 'Customizable' can be customized by using
'with', that directly acts on the instances, or 'With', that returns
new classes. Notice that one could make 'With' to customize the
original class, without returning a new one; however, in practice,
this would not be safe: I remind that changing a class modifies
automatically all its instances, even instances created *before*
the modification. This could produce bad surprises: it is better to
returns new classes, that may have the same name of the original one,
but are actually completely independent from it.
In order to solve the pizza shop problem we may define a 'CustomizablePizza'
class
::
#
class CustomizablePizza(GenericPizza,Customizable):
pass
#
which can be used in two ways: i) to customize instances just after creation:
>>> from oopp import CustomizablePizza
>>> largepizza=CustomizablePizza('large') # CustomizablePizza instance
>>> largemarinara=largepizza.with(toppinglist=['tomato'],baseprice=2)
>>> print largemarinara
large pizza with tomato mozzarella, cost $ 7.0
and ii) to generated customized new classes:
>>> Margherita=CustomizablePizza.With(
... toppinglist=['tomato','mozzarella'], __name__='Margherita')
>>> print Margherita('medium')
medium pizza with tomato,mozzarella, cost $ 4.0
The advantage of the bottom-up approach, is that the 'Customizable' class
can be reused in completely different problems; for instance, it could
be used as a class factory. For instance we could use it to generate a
'CustomizableClock' class as in this example:
>>> from oopp import *
>>> CustomizableClock=Customizable.With(get_time=staticmethod(Clock.get_time),
... __name__='CustomizableClock') #adds get_time
>>> CustomizableClock.get_time() # now it works
'09:57:50'
Here 'Customizable' "steal" the 'get_time' method from 'Clock'.
However that would be a rather perverse usage ;) I wrote it to show
the advantage of classmethods, more than to suggest to the reader that
this is an example of good programming.
.. [#] In Italy, you can easily find "pizzerie" with more than 50 different
kinds of pizzas (once I saw a menu with something like one hundred
different combinations ;)
Cooperative hierarchies
-----------------------------------------------------------------------
The examples of multiple inheritance hierarchies given until now were pretty
easy. The reason is that there was no interaction between the methods of the
children and of the parents. However, things get more complicated (and
interesting ;) when the methods in the hierarchy call each other.
Let me consider an example coming from paleoantropology:
::
#
class HomoHabilis(object):
def can(self):
print self,'can:'
print " - make tools"
class HomoSapiens(HomoHabilis):
def can(self): #overrides HomoHabilis.can
HomoHabilis.can(self)
print " - make abstractions"
class HomoSapiensSapiens(HomoSapiens):
def can(self): #overrides HomoSapiens.can
HomoSapiens.can(self)
print " - make art"
modernman=HomoSapiensSapiens()
modernman.can()
#
In this example children methods call parent methods:
'HomoSapiensSapiens.can' calls 'HomoSapiens.can' that in turns calls
'HomoHabilis.can' and the final output is:
::
<__main__.HomoSapiensSapiens object at 0x814e1fc> can:
- make tools
- make abstractions
- make art
The script works, but it is far from ideal, if code reuse and refactoring
are considered important requirements. The point is that (very likely, as the
research in paleoanthropology progresses) we may want to extend the
hierarchy, for instance by adding a class on the top or in the middle.
In the present form, this would require a non-trivial modification of
the source code (especially
if one think that the hierarchy could be fleshed out with dozens of others
methods and attributes). However, the aim of OOP is to avoid as
much as possible source code modifications. This goal can be attained in
practice, if the source code is written to be friendly to extensions and
improvements as much as possible. I think it is worth to spend some time
in improving this example, since what can be learn here,
can be lifted to real life cases.
First of all, let me define a generic *Homo* class, to be used
as first ring of the inheritance chain (actually the first ring is
'object'):
::
#
class Homo(PrettyPrinted):
"""Defines the method 'can', which is intended to be overriden
in the children classes, and inherits '__str__' from PrettyPrinted,
ensuring a nice printing representation for all children."""
def can(self):
print self,'can:'
#
Now, let me point out one of the shortcomings of the previous code: in each
subclass, we explicitly call its parent class (also called super class)
by its name. This is inconvenient, both because a change of name in
later stages of the project would require a lot of search and replace
(actually not a lot in this toy example, but you can imagine having
a very big projects with dozens of named method calls) and because it makes
difficult to insert a new element in the inheritance hierarchy.
The solution to this problems is the
``super`` built-in, which provides an easy access to the methods
of the superclass.
``super`` objects comes in two flavors: ``super(cls,obj)`` objects return
bound methods whereas ``super(cls)`` objects return unbound methods.
In the next code we will use the first form. The hierarchy can more elegantly
be rewritten as [#]_ :
::
#
from oopp import Homo
class HomoHabilis(Homo):
def can(self):
super(HomoHabilis,self).can()
print " - make tools"
class HomoSapiens(HomoHabilis):
def can(self):
super(HomoSapiens,self).can()
print " - make abstractions"
class HomoSapiensSapiens(HomoSapiens):
def can(self):
super(HomoSapiensSapiens,self).can()
print " - make art"
HomoSapiensSapiens().can()
#
with output
::
can:
- make tools
- make abstractions
- make art
This is not yet the most elegant form, since even
if ``super`` avoids naming the base class explicitely, still it
requires to explicitely name the class where it is defined. This is
rather annoying.
Removing that restriction, i.e. implementing really anonymous
``super`` calls, is possible but requires a good understand of
private variables in inheritance.
Inheritance and privacy
----------------------------------------------------------------------
In order to define anonymous cooperative super calls, we need classes
that know themselves, i.e. containing a reference to themselves. This
is not an obvious problem as it could seems, since it cannot be solved
without incurring in the biggest annoyance in inheritance:
*name clashing*. Name clashing happens when names and attributes defined
in different ancestors overrides each other in a unwanted order.
Name clashing is especially painful in the case of cooperative
hierarchies and particularly in in the problem at hand.
A naive solution would be to attach a plain (i.e. non-private)
attribute '.this' to the class, containing a reference
to itself, that can be invoked by the methods of the class.
Suppose, for instance, that I want to use that attribute in the ``__init__``
method of that class. A naive attempt would be to write something like:
>>> class B(object):
... def __init__(self):
... print self.this,'.__init__' # .this defined later
>>> B.this=B # B.this can be set only after B has been created
>>> B()
Unfortunately, this approach does not work with cooperative hierarchies.
Consider, for instance, extending 'B' with a cooperative children
class 'C' as follows:
>>> class C(B):
... def __init__(self):
... super(self.this,self).__init__() # cooperative call
... print type(self).this,'.__init__'
>>> C.this=C
``C.__init__`` calls ``B.__init__`` by passing a 'C' instance, therefore
``C.this`` is printed and not ``B.this``:
>>> C()
.__init__
.__init__
<__main__.C object at 0x4042ca6c>
The problem is that the ``C.this`` overrides ``B.this``. The only
way of avoiding the name clashing is to use a private attribute
``.__this``, as in the following script:
::
#
class B(object):
def __init__(self):
print self.__this,'.__init__'
B._B__this=B
class C(B):
def __init__(self):
super(self.__this,self).__init__() # cooperative __init__
print self.__this,'.__init__'
C._C__this=C
C()
# output:
# .__init__
# .__init__
#
The script works since, due to the magic of the mangling mechanism,
in ``B.__init__``, ``self._B__this`` i.e. ``B`` is retrieved, whereas in
``C.__init__`` ``self._C__this`` i.e. ``C`` is retrieved.
The elegance of the mechanism can be improved with an helper function
that makes its arguments reflective classes, i.e. classes with a
``__this`` private attribute:
::
#
def reflective(*classes):
"""Reflective classes know themselves, i.e. they possess a private
attribute __this containing a reference to themselves. If the class
name starts with '_', the underscores are stripped."""
for c in classes:
name=c.__name__ .lstrip('_') # in 2.3
setattr(c,'_%s__this' % name,c)
#
It is trivial to rewrite the paleonthropological hierarchy in terms of
anonymous cooperative super calls by using this trick.
::
#
class HomoHabilis(Homo):
def can(self):
super(self.__this,self).can()
print " - make tools"
class HomoSapiens(HomoHabilis):
def can(self):
super(self.__this,self).can()
print " - make abstractions"
class HomoSapiensSapiens(HomoSapiens):
def can(self):
super(self.__this,self).can()
print " - make art"
reflective(HomoHabilis,HomoSapiens,HomoSapiensSapiens)
#
Here there is an example of usage:
>>> from oopp import *
>>> man=HomoSapiensSapiens(); man.can()
can:
- make tools
- make abstractions
- make art
We may understand why it works by looking at the attributes of man:
>>> print pretty(attributes(man))
_HomoHabilis__this =
_HomoSapiensSapiens__this =
_HomoSapiens__this =
can = >
formatstring = %s
It is also interesting to notice that the hierarchy can be entirely
rewritten without using cooperative methods, but using private attributes,
instead. This second approach is simpler, as the following script shows:
::
#
from oopp import PrettyPrinted,attributes,pretty
class Homo(PrettyPrinted):
def can(self):
print self,'can:'
for attr,value in attributes(self).iteritems():
if attr.endswith('__attr'): print value
class HomoHabilis(Homo):
__attr=" - make tools"
class HomoSapiens(HomoHabilis):
__attr=" - make abstractions"
class HomoSapiensSapiens(HomoSapiens):
__attr=" - make art"
modernman=HomoSapiensSapiens()
modernman.can()
print '----------------------------------\nAttributes of',modernman
print pretty(attributes(modernman))
#
Here I have replaced the complicate chain of cooperative methods with
much simpler private attributes. Only the 'can' method in the 'Homo'
class survives, and it is modified to print the value of the '__attr'
attributes. Moreover, all the classes of the hierarchy have been made
'Customizable', in view of future extensions.
The second script is much shorter and much more elegant than the original
one, however its logic can be a little baffling, at first. The solution
to the mistery is provided by the attribute dictionary of 'moderman',
given by the second part of the output:
::
can:
- make abstractions
- make art
- make tools
------------------------------------------
Attributes of :
_HomoHabilis__attr = - make tools
_HomoSapiensSapiens__attr = - make art
_HomoSapiens__attr = - make abstractions
can = >
formatstring = %s
We see that, in addition to the 'can' method inherited from 'Homo',
the 'with' and 'With' method inherited from 'Customizable' and
the 'formatstring' inherited from 'PrettyPrinted',
``moderman`` has the attributes
::
_HomoHabilis__attr:' - make tools' # inherited from HomoHabilis
_HomoSapiens__attr:' - make abstractions'# inherited from HomoSapiens
_HomoSapiensSapiens__attr: ' - make art' # inherited from HomoSapiensSapiens
which origin is obvious, once one reminds the mangling mechanism associated
with private variables. The important point is that the trick would *not*
have worked for normal attributes. Had I used as variable name
'attr' instead of '__attr', the name would have been overridden: the only
attribute of 'HomoSapiensSapiens' would have been ' - make art'.
This example explains the advantages of private variables during inheritance:
they cannot be overridden. Using private name guarantees the absence of
surprises due to inheritance. If a class B has only private variables,
deriving a class C from B cannot cause name clashes.
Private variables have a drawbacks, too. The most obvious disadvantages is
the fact that in order to customize private variables outside their
defining class, one needs to pass explicitly the name of the class.
For instance we could not change an attribute with the syntax
``HomoHabilis.With(__attr=' - work the stone')``, we must write the
more verbose, error prone and redundant
``HomoHabilis.With(_HomoHabilis__attr=' - work the stone')``
A subtler drawback will be discussed in chapter 6.
.. [#] In single inheritance hierarchies, ``super`` can be dismissed
in favor of ``__base__``: for instance,
``super(HomoSapiens,self).can()`` is equivalent to
``HomoSapiens.__base__.can(self)``. Nevertheless, in view
of possible extensions to multiple inheritance, using ``super`` is a
much preferable choice.
THE SOPHISTICATION OF DESCRIPTORS
===========================================================================
Attribute descriptors are important metaprogramming tools that allows
the user to customize the behavior of attributes in custom classes.
For instance, attribute descriptors (or descriptors for short)
can be used as method wrappers,
to modify or enhance methods (this is the case for the well
known staticmethods and classmethods attribute descriptors); they
can also be used as attribute wrappers, to change or restrict the access to
attributes (this is the case for properties). Finally, descriptors
allows the user to play with the resolution order of attributes:
for instance, the ``super`` built-in object used in (multiple) inheritance
hierarchies, is implemented as an attribute descriptor.
In this chapter, I will show how the user can define its own attribute
descriptors and I will give some example of useful things you can do with
them (in particular to add tracing and timing capabilities).
Motivation
---------------------------------------------------------------------------
Attribute descriptors are a recent idea (they where first introduced in
Python 2.2) nevertheless, under the hood, are everywhere in Python. It is
a tribute to Guido's ability of hiding Python complications that
the average user can easily miss they existence.
If you need to do simple things, you can very well live without
the knowledge of descriptors. On the other hand, if you need difficult
things (such as tracing all the attribute access of your modules)
attribute descriptors, allow you to perform
impressive things.
Let me start by showing why the knowledge of attribute descriptors is
essential for any user seriously interested in metaprogramming applications.
Suppose I want to trace the methods of a clock:
>>> import oopp
>>> clock=oopp.Clock()
This is easily done with the ``with_tracer`` closure of chapter 2:
>>> oopp.wrapfunctions(clock,oopp.with_tracer)
>>> clock.get_time()
[] Calling 'get_time' with arguments
(){} ...
-> '.get_time' called with result: 19:55:07
'19:55:07'
However, this approach fails if I try to trace the entire class:
>>> oopp.wrapfunctions(oopp.Clock,oopp.with_tracer)
>>> oopp.Clock.get_time() # error
Traceback (most recent call last):
File "", line 6, in ?
TypeError: unbound method _() must be called with Clock instance
as first argument (got nothing instead)
The reason is that ``wrapfunctions`` sets the attributes of 'Clock'
by invoking ``customize``, which uses ``setattr``. This converts
'_' (i.e. the traced version of ``get_time``) in a regular method, not in
a staticmethod!
In order to trace staticmethods, one has to understand the nature
of attribute descriptors.
Functions versus methods
----------------------------------------------------------------------
Attribute descriptors are essential for the implementation
of one of the most basic Python features: the automatic conversion
of functions in methods. As I already anticipated in chapter 1, there is
a sort of magic when one writes ``Clock.get_time=lambda self: get_time()``
and Python automagically converts the right hand side, that is a
function, to a left hand side that is a (unbound) method. In order to
understand this magic, one needs a better comprehension of the
relation between functions and methods.
Actually, this relationship is quite subtle
and has no analogous in mainstream programming languages.
For instance, C is not OOP and has only functions, lacking the concept
of method, whereas Java (as other OOP languages)
has no functions, only methods.
C++ has functions and methods, but functions are completely
different from methods On the other hand, in Python,
functions and methods can be transformed both ways.
To show how it works, let me start by defining a simple printing
function:
::
#
import __main__ # gives access to the __main__ namespace from the module
def prn(s):
"""Given an evaluable string, print its value and its object reference.
Notice that the evaluation is done in the __main__ dictionary."""
try: obj=eval(s,__main__.__dict__)
except: print 'problems in evaluating',s
else: print s,'=',obj,'at',hex(id(obj))
#
Now, let me define a class with a method ``m`` equals to the identity
function ``f``:
>>> def f(x): "Identity function"; return x
...
>>> class C(object):
... m=f
... print m #here m is the function f
We see that *inside* its defining class, ``m`` coincides with the function
``f`` (the object reference is the same):
>>> f
We may retrieve ``m`` from *outside* the class via the class dictionary [#]_:
>>> C.__dict__['m']
However, if we invoke ``m`` with
the syntax ``C.m``, then it (magically) becomes a (unbound) method:
>>> C.m #here m has become a method!
But why it is so? How comes that in the second syntax the function
``f`` is transformed in a (unbound) method? To answer that question, we have
to understand how attributes are really invoked in Python, i.e. via
attribute descriptors.
Methods versus functions
-----------------------------------------------------------------------------
First of all, let me point out the differences between methods and
functions. Here, ``C.m`` does *not* coincides with ``C.__dict__['m']``
i.e. ``f``, since its object reference is different:
>>> from oopp import prn,attributes
>>> prn('C.m')
C.m = at 0x81109b4
The difference is clear since methods and functions have different attributes:
>>> attributes(f).keys()
['func_closure', 'func_dict', 'func_defaults', 'func_name',
'func_code', 'func_doc', 'func_globals']
whereas
>>> attributes(C.m).keys()
['im_func', 'im_class', 'im_self']
We discussed few of the functions attributes in the chapter
on functions. The instance method attributes are simpler: ``im_self``
returns the object to which the method is attached,
>>> print C.m.im_self #unbound method, attached to the class
None
>>> C().m.im_self #bound method, attached to C()
<__main__.C object at 0x81bf4ec>
``im_class`` returns the class to which the
method is attached
>>> C.m.im_class #class of the unbound method
>>> C().m.im_class #class of the bound method,
and ``im_func`` returns the function equivalent to
the method.
>>> C.m.im_func
>>> C().m.im_func # the same
As the reference manual states, calling
``m(*args,**kw)`` is completely equivalent to calling
``m.im_func(m.im_self, *args,**kw)``".
As a general rule, an attribute descriptor is an object with a ``__get__``
special method. The most used descriptors are the good old functions:
they have a ``__get__`` special method returning a *method-wrapper object*
>>> f.__get__
method-wrapper objects can be transformed in (both bound and unbound) methods:
>>> f.__get__(None,C)
>>> f.__get__(C(),C)
>
The general calling syntax for method-wrapper objects is
``.__get__(obj,cls=None)``, where the first argument is an
instance object or None and the second (optional) argument is the class (or a
generic superclass) of the first one.
Now we see what happens when we use the syntax ``C.m``: Python interprets
this as a shortcut for ``C.__dict['m'].__get__(None,C)`` (if ``m`` is
in the 'C' dictionary, otherwise it looks for ancestor dictionaries).
We may check that everything is correct by observing that
``f.__get__(None,C)`` has exactly the same object reference than ``C.m``,
therefore they are the same object:
>>> hex(id(f.__get__(None,C))) # same as hex(id(C.m))
'0x811095c'
The process works equally well for the syntax ``getattr``:
>>> print getattr(C,'m'), hex(id(getattr(C,'m')))
0x811095c
and for bound methods: if
>>> c=C()
is an instance of the class C, then the syntax
>>> getattr(c,'m') #same as c.m
>
is a shortcut for
>>> type(c).__dict__['m'].__get__(c,C) # or f.__get__(c,C)
>
(notice that the object reference for ``c.m`` and ``f.__get__(c,C)`` is
the same, they are *exactly* the same object).
Both the unbound method C.m and the bound method c.m refer to the same
object at hexadecimal address 0x811095c. This object is common to all other
instances of C:
>>> c2=C()
>>> print c2.m,hex(id(c2.m)) #always the same method
> 0x811095c
One can also omit the second argument:
>>> c.m.__get__(c)
>
Finally, let me point out that methods are attribute descriptors too,
since they have a ``__get__`` attribute returning a method-wrapper
object:
>>> C.m.__get__
Notice that this method wrapper is *not* the same than the ``f.__get__``
method wrapper.
.. [#] If ``C.__dict['m']`` is not defined, Python looks if ``m`` is defined
in some ancestor of C. For instance if `B` is the base of `C`, it
looks in ``B.__dict['m']``, etc., by following the MRO.
Static methods and class methods
--------------------------------------------------------------------------
Whereas functions and methods are implicit attribute descriptors,
static methods and class methods are examples of explicit
descriptors. They allow to convert regular functions to
specific descriptor objects. Let me show a trivial example.
Given the identity function
>>> def f(x): return x
we may convert it to a staticmethod object
>>> sm=staticmethod(f)
>>> sm
or to a classmethod object
>>> cm=classmethod(f)
>>> cm
In both cases the ``__get__`` special method returns a method-wrapper object
>>> sm.__get__
>>> cm.__get__
However the static method wrapper is quite different from the class
method wrapper. In the first case the wrapper returns a function:
>>> sm.__get__(C(),C)
>>> sm.__get__(C())
in the second case it returns a method
>>> cm.__get__(C(),C)
>
Let me discuss more in detail the static methods, first.
It is always possible to extract the function from the static method
via the syntaxes ``sm.__get__(a)`` and ``sm.__get__(a,b)`` with *ANY* valid
a and b, i.e. the result does not depend on a and b. This is correct,
since static methods are actually function that have nothing to do
with the class and the instances to which they are bound.
This behaviour of the method wrapper makes clear why the relation between
methods and functions is inversed for static methods with respect to
regular methods:
>>> class C(object):
... s=staticmethod(lambda : None)
... print s
...
Static methods are non-trivial objects *inside* the class, whereas
they are regular functions *outside* the class:
>>> C.s
at 0x8158e7c>
>>> C().s
at 0x8158e7c>
The situation is different for classmethods: inside the class they
are non-trivial objects, just as static methods,
>>> class C(object):
... cm=classmethod(lambda cls: None)
... print cm
...
but outside the class they are methods bound to the class,
>>> c=C()
>>> prn('c.cm')
of >
0x811095c
and not to the instance 'c'. The reason is that the ``__get__`` wrapper method
can be invoked with the syntax ``__get__(a,cls)`` which
is only sensitive to the second argument or with the syntax
``__get__(obj)`` which is only sensitive to the type of the first
argument:
>>> cm.__get__('whatever',C) # the first argument is ignored
>
sensitive to the type of 'whatever':
>>> cm.__get__('whatever') # in Python 2.2 would give a serious error
>
Notice that the class method is actually bound to C's class, i.e.
to 'type'.
Just as regular methods (and differently
from static methods) classmethods have attributes ``im_class``, ``im_func``,
and ``im_self``. In particular one can retrieve the function wrapped inside
the classmethod with
>>> cm.__get__('whatever','whatever').im_func
The difference with regular methods is that ``im_class`` returns the
class of 'C' whereas ``im_self`` returns 'C' itself.
>>> C.cm.im_self # a classmethod is attached to the class
>>> C.cm.im_class #the class of C
Remark: Python 2.2.0 has a bug in classmethods (fixed in newer versions):
when the first argument of __get__ is None, then one must specify
the second argument (otherwise segmentation fault :-()
Properties
----------------------------------------------------------------------
Properties are a more general kind of attribute descriptors than
staticmethods and classmethods, since their effect can be customized
trough arbitrary get/set/del functions. Let me give an example:
>>> def getp(self): return 'property' # get function
...
>>> p=property(getp) # property object
>>> p
``p`` has a ``__get__`` special method returning a method-wrapper
object, just as it happens for other descriptors:
>>> p.__get__
The difference is that
>>> p.__get__(None,type(p))
>>> p.__get__('whatever')
'property'
>>> p.__get__('whatever','whatever')
'property'
As for static methods, the ``__get__`` method wrapper is independent from
its arguments, unless the first one is None: in such a case it returns
the property object, in all other circumstances it returns the result
of ``getp``. This explains the behavior
>>> class C(object): p=p
>>> C.p
>>> C().p
'property'
Properties are a dangerous feature, since they change the semantics
of the language. This means that apparently trivial operations can have
any kind of side effects:
>>> def get(self):return 'You gave me the order to destroy your hard disk!!'
>>> class C(object): x=property(get)
>>> C().x
'You gave me the order to destroy your hard disk!!'
Invoking 'C.x' could very well invoke an external program who is going
to do anything! It is up to the programmer to not abuse properties.
The same is true for user defined attribute descriptors.
There are situations in which they are quite handy, however. For
instance, properties can be used to trace the access data attributes.
This can be especially useful during debugging, or for logging
purposes.
Notice that this approach has the problem that now data attributes cannot
no more be called trough their class, but only though their instances.
Moreover properties do not work well with ``super`` in cooperative
methods.
User-defined attribute descriptors
----------------------------------------------------------------------
As we have seen, there are plenty of predefined attribute descriptors,
such as staticmethods, classmethods and properties (the built-in
``super`` is also an attribute descriptor which, for sake of
convenience, will be discussed in the next section).
In addition to them, the user can also define customized attribute
descriptors, simply trough classes with a ``__get__`` special method.
Let me give an example:
::
#
class ChattyAttr(object):
"""Chatty descriptor class; descriptor objects are intended to be
used as attributes in other classes"""
def __get__(self, obj, cls=None):
binding=obj is not None
if binding:
return 'You are binding %s to %s' % (self,obj)
else:
return 'Calling %s from %s' % (self,cls)
class C(object):
d=ChattyAttr()
c=C()
print c.d # <=> type(c).__dict__['d'].__get__(c,type(c))
print C.d # <=> C.__dict__['d'].__get__(None,C)
#
with output:
::
You are binding to
Calling from
Invoking a method with the syntax ``C.d`` or ``c.d`` involves calling
``__get__``. The ``__get__`` signature is fixed: it is
`` __get__=__get__(self,obj,cls=None)``, since the notation
``self.descr_attr`` automatically passes ``self`` and ``self.__class__`` to
``__get__``.
Custom descriptors can be used to restrict the access to objects in a
more general way than trough properties. For instance, suppose one
wants to raise an error if a given attribute 'a' is accessed, both
from the class and from the instance: a property cannot help here,
since it works only from the instance. The solution is the following
custom descriptor:
::
#
class AccessError(object):
"""Descriptor raising an AttributeError when the attribute is
accessed""" #could be done with a property
def __init__(self,errormessage):
self.msg=errormessage
def __get__(self,obj,cls=None):
raise AttributeError(self.msg)
#
>>> from oopp import AccessError
>>> class C(object):
... a=AccessError("'a' cannot be accessed")
>>> c=C()
>>> c.a #error
Traceback (most recent call last):
File "", line 1, in ?
File "oopp.py", line 313, in __get__
raise AttributeError(self.msg)
AttributeError: 'a' cannot be accessed
>>> C.a #error
Traceback (most recent call last):
File "", line 1, in ?
File "oopp.py", line 313, in __get__
raise AttributeError(self.msg)
AttributeError: 'a' cannot be accessed
It is always possibile to convert plain attributes (i.e. attributes
without a "__get__" method) to descriptor objects:
::
#
class convert2descriptor(object):
"""To all practical means, this class acts as a function that, given an
object, adds to it a __get__ method if it is not already there. The
added __get__ method is trivial and simply returns the original object,
independently from obj and cls."""
def __new__(cls,a):
if hasattr(a,"__get__"): # do nothing
return a # a is already a descriptor
else: # creates a trivial attribute descriptor
cls.a=a
return object.__new__(cls)
def __get__(self,obj,cls=None):
"Returns self.a independently from obj and cls"
return self.a
#
This example also shows the magic of ``__new__``, that allows to use a
class as a function. The output of 'convert2descriptor(a)' can be both
an instance of 'convert2descriptor' (in this case 'convert2descriptor' acts as
a normal class, i.e. as an object factory) or 'a' itself
(if 'a' is already a descriptor): in this case 'convert2descriptor' acts
as a function.
For instance, a string is converted to a descriptor
>>> from oopp import convert2descriptor
>>> a2=convert2descriptor('a')
>>> a2
>>> a2.__get__('whatever')
'a'
whereas a function is untouched:
>>> def f(): pass
>>> f2=convert2descriptor(f) # does nothing
>>> f2
Data descriptors
-------------------------------------------------------------------------
It is also possible to specify a ``__set__`` method (descriptors
with a ``__set__`` method are typically data descriptors) with
the signature ``__set__(self,obj,value)`` as in the following
example:
::
#
class DataDescriptor(object):
value=None
def __get__(self, obj, cls=None):
if obj is None: obj=cls
print "Getting",obj,"value =",self.value
return self.value
def __set__(self, obj, value):
self.value=value
print "Setting",obj,"value =",value
class C(object):
d=DataDescriptor()
c=C()
c.d=1 #calls C.__dict__['d'].__set__(c,1)
c.d #calls C.__dict__['d'].__get__(c,C)
C.d #calls C.__dict__['d'].__get__(None,C)
C.d=0 #does *not* call __set__
print "C.d =",C.d
#
With output:
::
Setting value = 1
Getting value = 1
Getting value = 1
C.d = 0
With this knowledge, we may now reconsider the clock example given
in chapter 3. #NO!??
>>> import oopp
>>> class Clock(object): pass
>>> myclock=Clock()
...
>>> myclock.get_time=oopp.get_time # this is a function
>>> Clock.get_time=lambda self : oopp.get_time() # this is a method
In this example, ``myclock.get_time``, which is attached to the ``myclock``
object, is a function, whereas ``Clock.get_time``, which is attached to
the ``Clock`` class is a method. We may also check this by using the ``type``
function:
>>> type(myclock.get_time)
whereas
>>> type(Clock.get_time)
It must be remarked that user-defined attribute descriptors, just as
properties, allow to arbitrarily change the semantics of the language
and should be used with care.
The ``super`` attribute descriptor
------------------------------------------------------------------------
super has also a second form, where it is more used as a descriptor.
``super`` objects are attribute descriptors, too, with a ``__get__``
method returning a method-wrapper object:
>>> super(C,C()).__get__
Here I give some example of acceptable call:
>>> super(C,C()).__get__('whatever')
, >
>>> super(C,C()).__get__('whatever','whatever')
, >
Unfortunately, for the time being
(i.e. for Python 2.3), the ``super`` mechanism has various limitations.
To show the issues, let me start by considering the following base class:
::
#
class ExampleBaseClass(PrettyPrinted):
"""Contains a regular method 'm', a staticmethod 's', a classmethod
'c', a property 'p' and a data attribute 'd'."""
m=lambda self: 'regular method of %s' % self
s=staticmethod(lambda : 'staticmethod')
c=classmethod(lambda cls: 'classmethod of %s' % cls)
p=property(lambda self: 'property of %s' % self)
a=AccessError('Expected error')
d='data'
#
Now, let me derive a new class C from ExampleBaseClass:
>>> from oopp import ExampleBaseClass
>>> class C(ExampleBaseClass): pass
>>> c=C()
Ideally, we would like to retrieve the methods and attributes of
ExampleBaseClass from C, by using the ``super`` mechanism.
1. We see that ``super`` works without problems for regular methods,
staticmethods and classmethods:
>>> super(C,c).m()
'regular method of '
>>> super(C,c).s()
'staticmethod'
>>> super(C,c).c()
"classmethod of "
It also works for user defined attribute descriptors:
>>> super(C,c).a # access error
Traceback (most recent call last):
File "", line 1, in ?
File "oopp.py", line 340, in __get__
raise AttributeError(self.msg)
AttributeError: Expected error
and for properties (only for Python 2.3+):
>>> ExampleBaseClass.p
In Python 2.2 one would get an error, instead
>>> super(C,c).p #error
Traceback (most recent call last):
File "", line 1, in ?
AttributeError: 'super' object has no attribute 'p'
3. Moreover, certain attributes of the superclass, such as its
``__name__``, cannot be retrieved:
>>> ExampleBaseClass.__name__
'ExampleBaseClass'
>>> super(C,c).__name__ #error
Traceback (most recent call last):
File "", line 1, in ?
AttributeError: 'super' object has no attribute '__name__'
4. There is no direct way to retrieve the methods of the super-superclass
(i.e. the grandmother class, if you wish) or in general the furthest
ancestors, since ``super`` does not chain.
5. Finally, there are some subtle issues with the ``super(cls)`` syntax:
>>> super(C).m #(2) error
Traceback (most recent call last):
File "", line 1, in ?
AttributeError: 'super' object has no attribute 'm'
means ``super(C).__get__(None,C)``, but only
``super(C).__get__(c,C).m==super(C,c)`` works.
On the other hand,
>>> super(C).__init__ #(1)
>>> super(C).__new__ #(1)
seems to work, whereas in reality does not. The reason is that since
``super`` objects are instances
of ``object``, they inherit object's methods, and in particular
``__init__`` ; therefore the ``__init__`` method in (1) is *not*
the ``ExampleBaseClass.__init__`` method. The point is that ``super``
objects are attribute descriptors and not references to the superclass.
Probably, in future versions of Python the ``super`` mechanism will be
improved. However, for the time being, one must provide a workaround for
dealing with these issues. This will be discussed in the next chapter.
Method wrappers
----------------------------------------------------------------------
One of the most typical applications of attribute descriptors is their
usage as *method wrappers*.
Suppose, for instance, one wants to add tracing capabilities to
the methods of a class for debugging purposes. The problem
can be solved with a custom descriptor class:
::
#
import inspect
class wrappedmethod(Customizable):
"""Customizable method factory intended for derivation.
The wrapper method is overridden in the children."""
logfile=sys.stdout # default
namespace='' # default
def __new__(cls,meth): # meth is a descriptor
if isinstance(meth,FunctionType):
kind=0 # regular method
func=meth
elif isinstance(meth,staticmethod):
kind=1 # static method
func=meth.__get__('whatever')
elif isinstance(meth,classmethod):
kind=2 # class method
func=meth.__get__('whatever','whatever').im_func
elif isinstance(meth,wrappedmethod): # already wrapped
return meth # do nothing
elif inspect.ismethoddescriptor(meth):
kind=0; func=meth # for many builtin methods
else:
return meth # do nothing
self=super(wrappedmethod,cls).__new__(cls)
self.kind=kind; self.func=func # pre-initialize
return self
def __init__(self,meth): # meth not used
self.logfile=self.logfile # default values
self.namespace=self.namespace # copy the current
def __get__(self,obj,cls): # closure
def _(*args,**kw):
if obj is None: o=() # unbound method call
else: o=(obj,) # bound method call
allargs=[o,(),(cls,)][self.kind]+args
return self.wrapper()(*allargs,**kw)
return _ # the wrapped function
# allargs is the only nontrivial line in _; it adds
# 0 - obj if meth is a regular method
# 1 - nothing if meth is a static method
# 2 - cls if meth is a class method
def wrapper(self): return self.func # do nothing, to be overridden
#
This class is intended for derivation: the wrapper method has to be overridden
in the children in order to introduce the wanted feature. If I want to
implement the capability of tracing methods, I can reuse the ``with_tracer``
closure introduced in chapter 2:
::
#
class tracedmethod(wrappedmethod):
def wrapper(self):
return with_tracer(self.func,self.namespace,self.logfile)
#
Nothing prevents me from introducing timing features by reusing the
``with_timer`` closure:
::
#
class timedmethod(wrappedmethod):
iterations=1 # additional default parameter
def __init__(self,meth):
super(timedmethod,self).__init__(self,meth)
self.iterations=self.iterations # copy
def wrapper(self):
return with_timer(self.func,self.namespace,
self.iterations,self.logfile)
#
Here there is an example of usage:
The dictionary of wrapped functions is then built from an utility function
::
#
def wrap(obj,wrapped,condition=lambda k,v: True, err=None):
"Retrieves obj's dictionary and wraps it"
if isinstance(obj,dict): # obj is a dictionary
dic=obj
else:
dic=getattr(obj,'__dict__',{}).copy() # avoids dictproxy objects
if not dic: dic=attributes(obj) # for simple objects
wrapped.namespace=getattr(obj,'__name__','')
for name,attr in dic.iteritems(): # modify dic
if condition(name,attr): dic[name]=wrapped(attr)
if not isinstance(obj,dict): # modify obj
customize(obj,err,**dic)
#
::
#
from oopp import *
class C(object):
"Class with traced methods"
def f(self): return self
f=tracedmethod(f)
g=staticmethod(lambda:None)
g=tracedmethod(g)
h=classmethod(do_nothing)
h=tracedmethod(h)
c=C()
#unbound calls
C.f(c)
C.g()
C.h()
#bound calls
c.f()
c.g()
c.h()
#
Output:
::
[C] Calling 'f' with arguments
(,){} ...
-> 'C.f' called with result:
[C] Calling '' with arguments
(){} ...
-> 'C.' called with result: None
[C] Calling 'do_nothing' with arguments
(,){} ...
-> 'C.do_nothing' called with result: None
[C] Calling 'f' with arguments
(,){} ...
-> 'C.f' called with result:
[C] Calling '' with arguments
(){} ...
-> 'C.' called with result: None
[C] Calling 'do_nothing' with arguments
(,){} ...
-> 'C.do_nothing' called with result: None
The approach in 'tracingmethods.py' works, but it is far from
being elegant, since I had to explicitly wrap each method in the
class by hand.
Both problems can be avoided.
>>> from oopp import *
>>> wrap(Clock,tracedmethod)
>>> Clock.get_time()
[Clock] Calling 'get_time' with arguments
(){} ...
-> 'Clock.get_time' called with result: 21:56:52
'21:56:52'
THE SUBTLETIES OF MULTIPLE INHERITANCE
==========================================================================
In chapter 4 we introduced the concept of multiple inheritance and discussed
its simplest applications in absence of name collisions. When with methods
with different names are derived from different classes multiple inheritance
is pretty trivial. However, all kind of subtilites comes in presence of name
clashing, i.e. when we multiply inherits different methods defined in different
classes but with the *same* name.
In order to understand what happens in this situation, it is essential to
understand the concept of Method Resolution Order (MRO). For reader's
convenience, I collect in this chapter some of the information
reported in http://www.python.org/2.3/mro.html.
A little bit of history: why Python 2.3 has changed the MRO
------------------------------------------------------------------------------
Everything started with a post by Samuele Pedroni to the Python
development mailing list [#]_. In his post, Samuele showed that the
Python 2.2 method resolution order is not monotonic and he proposed to
replace it with the C3 method resolution order. Guido agreed with his
arguments and therefore now Python 2.3 uses C3. The C3 method itself
has nothing to do with Python, since it was invented by people working
on Dylan and it is described in a paper intended for lispers [#]_. The
present paper gives a (hopefully) readable discussion of the C3
algorithm for Pythonistas who want to understand the reasons for the
change.
First of all, let me point out that what I am going to say only applies
to the *new style classes* introduced in Python 2.2: *classic classes*
maintain their old method resolution order, depth first and then left to
right. Therefore, there is no breaking of old code for classic classes;
and even if in principle there could be breaking of code for Python 2.2
new style classes, in practice the cases in which the C3 resolution
order differs from the Python 2.2 method resolution order are so rare
that no real breaking of code is expected. Therefore: don't be scared!
Moreover, unless you make strong use of multiple inheritance and you
have non-trivial hierarchies, you don't need to understand the C3
algorithm, and you can easily skip this paper. On the other hand, if
you really want to know how multiple inheritance works, then this paper
is for you. The good news is that things are not as complicated as you
might expect.
Let me begin with some basic definitions.
1) Given a class C in a complicated multiple inheritance hierarchy, it
is a non-trivial task to specify the order in which methods are
overridden, i.e. to specify the order of the ancestors of C.
2) The list of the ancestors of a class C, including the class itself,
ordered from the nearest ancestor to the furthest, is called the
class precedence list or the *linearization* of C.
3) The *Method Resolution Order* (MRO) is the set of rules that
construct the linearization. In the Python literature, the idiom
"the MRO of C" is also used as a synonymous for the linearization of
the class C.
4) For instance, in the case of single inheritance hierarchy, if C is a
subclass of C1, and C1 is a subclass of C2, then the linearization of
C is simply the list [C, C1 , C2]. However, with multiple
inheritance hierarchies, it is more difficult to construct a
linearization that respects *local precedence ordering* and
*monotonicity*.
5) I will discuss the local precedence ordering later, but I can give
the definition of monotonicity here. A MRO is monotonic when the
following is true: *if C1 precedes C2 in the linearization of C,
then C1 precedes C2 in the linearization of any subclass of C*.
Otherwise, the innocuous operation of deriving a new class could
change the resolution order of methods, potentially introducing very
subtle bugs. Examples where this happens will be shown later.
6) Not all classes admit a linearization. There are cases, in
complicated hierarchies, where it is not possible to derive a class
such that its linearization respects all the desired properties.
Here I give an example of this situation. Consider the hierarchy
>>> O = object
>>> class X(O): pass
>>> class Y(O): pass
>>> class A(X,Y): pass
>>> class B(Y,X): pass
which can be represented with the following inheritance graph, where I
have denoted with O the ``object`` class, which is the beginning of any
hierarchy for new style classes:
::
-----------
| |
| O |
| / \ |
- X Y /
| / | /
| / |/
A B
\ /
?
In this case, it is not possible to derive a new class C from A and B,
since X precedes Y in A, but Y precedes X in B, therefore the method
resolution order would be ambiguous in C.
Python 2.3 raises an exception in this situation (TypeError: MRO
conflict among bases Y, X) forbidding the naive programmer from creating
ambiguous hierarchies. Python 2.2 instead does not raise an exception,
but chooses an *ad hoc* ordering (CABXYO in this case).
The C3 Method Resolution Order
------------------------------
Let me introduce a few simple notations which will be useful for the
following discussion. I will use the shortcut notation
C1 C2 ... CN
to indicate the list of classes [C1, C2, ... , CN].
The *head* of the list is its first element:
head = C1
whereas the *tail* is the rest of the list:
tail = C2 ... CN.
I shall also use the notation
C + (C1 C2 ... CN) = C C1 C2 ... CN
to denote the sum of the lists [C] + [C1, C2, ... ,CN].
Now I can explain how the MRO works in Python 2.3.
Consider a class C in a multiple inheritance hierarchy, with C
inheriting from the base classes B1, B2, ... , BN. We want to compute
the linearization L[C] of the class C. In order to do that, we need the
concept of *merging* lists, since the rule says that
*the linearization of C is the sum of C plus the merge of a) the
linearizations of the parents and b) the list of the parents.*
In symbolic notation:
L[C(B1 ... BN)] = C + merge(L[B1] ... L[BN], B1 ... BN)
How is the merge computed? The rule is the following:
*take the head of the first list, i.e L[B1][0]; if this head is not in
the tail of any of the other lists, then add it to the linearization
of C and remove it from the lists in the merge, otherwise look at the
head of the next list and take it, if it is a good head. Then repeat
the operation until all the class are removed or it is impossible to
find good heads. In this case, it is impossible to construct the
merge, Python 2.3 will refuse to create the class C and will raise an
exception.*
This prescription ensures that the merge operation *preserves* the
ordering, if the ordering can be preserved. On the other hand, if the
order cannot be preserved (as in the example of serious order
disagreement discussed above) then the merge cannot be computed.
The computation of the merge is trivial if:
1. C is the ``object`` class, which has no parents; in this case its
linearization coincides with itself,
L[object] = object.
2. C has only one parent (single inheritance); in this case
L[C(B)] = C + merge(L[B],B) = C + L[B]
However, in the case of multiple inheritance things are more cumbersome
and I don't expect you can understand the rule without a couple of
examples ;-)
Examples
--------
First example. Consider the following hierarchy:
>>> O = object
>>> class F(O): pass
>>> class E(O): pass
>>> class D(O): pass
>>> class C(D,F): pass
>>> class B(D,E): pass
>>> class A(B,C): pass
In this case the inheritance graph can be drawn as
::
6
---
Level 3 | O | (more general)
/ --- \
/ | \ |
/ | \ |
/ | \ |
--- --- --- |
Level 2 3 | D | 4| E | | F | 5 |
--- --- --- |
\ \ _ / | |
\ / \ _ | |
\ / \ | |
--- --- |
Level 1 1 | B | | C | 2 |
--- --- |
\ / |
\ / \ /
---
Level 0 0 | A | (more specialized)
---
The linearizations of O,D,E and F are trivial:
::
L[O] = O
L[D] = D O
L[E] = E O
L[F] = F O
The linearization of B can be computed as
::
L[B] = B + merge(DO, EO, DE)
We see that D is a good head, therefore we take it and we are reduced to
compute merge(O,EO,E). Now O is not a good head, since it is in the
tail of the sequence EO. In this case the rule says that we have to
skip to the next sequence. Then we see that E is a good head; we take
it and we are reduced to compute merge(O,O) which gives O. Therefore
::
L[B] = B D E O
Using the same procedure one finds:
::
L[C] = C + merge(DO,FO,DF)
= C + D + merge(O,FO,F)
= C + D + F + merge(O,O)
= C D F O
Now we can compute:
::
L[A] = A + merge(BDEO,CDFO,BC)
= A + B + merge(DEO,CDFO,C)
= A + B + C + merge(DEO,DFO)
= A + B + C + D + merge(EO,FO)
= A + B + C + D + E + merge(O,FO)
= A + B + C + D + E + F + merge(O,O)
= A B C D E F O
In this example, the linearization is ordered in a pretty nice way
according to the inheritance level, in the sense that lower levels (i.e.
more specialized classes) have higher precedence (see the inheritance
graph). However, this is not the general case.
I leave as an exercise for the reader to compute the linearization for
my second example:
>>> O = object
>>> class F(O): pass
>>> class E(O): pass
>>> class D(O): pass
>>> class C(D,F): pass
>>> class B(E,D): pass
>>> class A(B,C): pass
The only difference with the previous example is the change B(D,E) -->
B(E,D); however even such a little modification completely changes the
ordering of the hierarchy
::
6
---
Level 3 | O |
/ --- \
/ | \
/ | \
/ | \
--- --- ---
Level 2 2 | E | 4 | D | | F | 5
--- --- ---
\ / \ /
\ / \ /
\ / \ /
--- ---
Level 1 1 | B | | C | 3
--- ---
\ /
\ /
---
Level 0 0 | A |
---
Notice that the class E, which is in the second level of the hierarchy,
precedes the class C, which is in the first level of the hierarchy, i.e.
E is more specialized than C, even if it is in a higher level.
A lazy programmer can obtain the MRO directly from Python 2.2, since in
this case it coincides with the Python 2.3 linearization. It is enough
to invoke the .mro() method of class A:
>>> A.mro()
(, , ,
, , ,
)
Finally, let me consider the example discussed in the first section,
involving a serious order disagreement. In this case, it is
straightforward to compute the linearizations of O, X, Y, A and B:
::
L[O] = 0
L[X] = X O
L[Y] = Y O
L[A] = A X Y O
L[B] = B Y X O
However, it is impossible to compute the linearization for a class C
that inherits from A and B:
::
L[C] = C + merge(AXYO, BYXO, AB)
= C + A + merge(XYO, BYXO, B)
= C + A + B + merge(XYO, YXO)
At this point we cannot merge the lists XYO and YXO, since X is in the
tail of YXO whereas Y is in the tail of XYO: therefore there are no
good heads and the C3 algorithm stops. Python 2.3 raises an error and
refuses to create the class C.
Bad Method Resolution Orders
----------------------------
A MRO is *bad* when it breaks such fundamental properties as local
precedence ordering and monotonicity. In this section, I will show
that both the MRO for classic classes and the MRO for new style classes
in Python 2.2 are bad.
It is easier to start with the local precedence ordering. Consider the
following example:
>>> F=type('Food',(),{'remember2buy':'spam'})
>>> E=type('Eggs',(F,),{'remember2buy':'eggs'})
>>> G=type('GoodFood',(F,E),{}) #under Python 2.3 this is an error
with inheritance diagram
::
O
|
(buy spam) F
| \
| E (buy eggs)
| /
G
(buy eggs or spam ?)
We see that class G inherits from F and E, with F *before* E: therefore
we would expect the attribute *G.remember2buy* to be inherited by
*F.rembermer2buy* and not by *E.remember2buy*: nevertheless Python 2.2
gives
>>> G.remember2buy #under Python 2.3 this is an error
'eggs'
This is a breaking of local precedence ordering since the order in the
local precedence list, i.e. the list of the parents of G, is not
preserved in the Python 2.2 linearization of G:
::
L[G,P22]= G E F object # F *follows* E
One could argue that the reason why F follows E in the Python 2.2
linearization is that F is less specialized than E, since F is the
superclass of E; nevertheless the breaking of local precedence ordering
is quite non-intuitive and error prone. This is particularly true since
it is a different from old style classes:
>>> class F: remember2buy='spam'
>>> class E(F): remember2buy='eggs'
>>> class G(F,E): pass
>>> G.remember2buy
'spam'
In this case the MRO is GFEF and the local precedence ordering is
preserved.
As a general rule, hierarchies such as the previous one should be
avoided, since it is unclear if F should override E or viceversa.
Python 2.3 solves the ambiguity by raising an exception in the creation
of class G, effectively stopping the programmer from generating
ambiguous hierarchies. The reason for that is that the C3 algorithm
fails when the merge
::
merge(FO,EFO,FE)
cannot be computed, because F is in the tail of EFO and E is in the tail
of FE.
The real solution is to design a non-ambiguous hierarchy, i.e. to derive
G from E and F (the more specific first) and not from F and E; in this
case the MRO is GEF without any doubt.
::
O
|
F (spam)
/ |
(eggs) E |
\ |
G
(eggs, no doubt)
Python 2.3 forces the programmer to write good hierarchies (or, at
least, less error-prone ones).
On a related note, let me point out that the Python 2.3 algorithm is
smart enough to recognize obvious mistakes, as the duplication of
classes in the list of parents:
>>> class A(object): pass
>>> class C(A,A): pass # error
Traceback (most recent call last):
File "", line 1, in ?
TypeError: duplicate base class A
Python 2.2 (both for classic classes and new style classes) in this
situation, would not raise any exception.
Finally, I would like to point out two lessons we have learned from this
example:
1. despite the name, the MRO determines the resolution order of
attributes, not only of methods;
2. the default food for Pythonistas is spam ! (but you already knew
that ;-)
Having discussed the issue of local precedence ordering, let me now
consider the issue of monotonicity. My goal is to show that neither the
MRO for classic classes nor that for Python 2.2 new style classes is
monotonic.
To prove that the MRO for classic classes is non-monotonic is rather
trivial, it is enough to look at the diamond diagram:
::
C
/ \
/ \
A B
\ /
\ /
D
One easily discerns the inconsistency:
::
L[B,P21] = B C # B precedes C : B's methods win
L[D,P21] = D A C B C # B follows C : C's methods win!
On the other hand, there are no problems with the Python 2.2 and 2.3
MROs, they give both
::
L[D] = D A B C
Guido points out in his essay [#]_ that the classic MRO is not so bad in
practice, since one can typically avoids diamonds for classic classes.
But all new style classes inherit from object, therefore diamonds are
unavoidable and inconsistencies shows up in every multiple inheritance
graph.
The MRO of Python 2.2 makes breaking monotonicity difficult, but not
impossible. The following example, originally provided by Samuele
Pedroni, shows that the MRO of Python 2.2 is non-monotonic:
>>> class A(object): pass
>>> class B(object): pass
>>> class C(object): pass
>>> class D(object): pass
>>> class E(object): pass
>>> class K1(A,B,C): pass
>>> class K2(D,B,E): pass
>>> class K3(D,A): pass
>>> class Z(K1,K2,K3): pass
Here are the linearizations according to the C3 MRO (the reader should
verify these linearizations as an exercise and draw the inheritance
diagram ;-)
::
L[A] = A O
L[B] = B O
L[C] = C O
L[D] = D O
L[E] = E O
L[K1]= K1 A B C O
L[K2]= K2 D B E O
L[K3]= K3 D A O
L[Z] = Z K1 K2 K3 D A B C E O
Python 2.2 gives exactly the same linearizations for A, B, C, D, E, K1,
K2 and K3, but a different linearization for Z:
::
L[Z,P22] = Z K1 K3 A K2 D B C E O
It is clear that this linearization is *wrong*, since A comes before D
whereas in the linearization of K3 A comes *after* D. In other words, in
K3 methods derived by D override methods derived by A, but in Z, which
still is a subclass of K3, methods derived by A override methods derived
by D! This is a violation of monotonicity. Moreover, the Python 2.2
linearization of Z is also inconsistent with local precedence ordering,
since the local precedence list of the class Z is [K1, K2, K3] (K2
precedes K3), whereas in the linearization of Z K2 *follows* K3. These
problems explain why the 2.2 rule has been dismissed in favor of the C3
rule.
.. [#] The thread on python-dev started by Samuele Pedroni:
http://mail.python.org/pipermail/python-dev/2002-October/029035.html
.. [#] The paper *A Monotonic Superclass Linearization for Dylan*:
http://www.webcom.com/haahr/dylan/linearization-oopsla96.html
.. [#] Guido van Rossum's essay, *Unifying types and classes in Python 2.2*:
http://www.python.org/2.2.2/descrintro.html
.. [#] The (in)famous book on metaclasses, *Putting Metaclasses to Work*:
Ira R. Forman, Scott Danforth, Addison-Wesley 1999 (out of print,
but probably still available on http://www.amazon.com)
Understanding the Method Resolution Order
--------------------------------------------------------------------------
The MRO of any given (new style) Python class is given
by the special attribute ``__mro__``. Notice that since
Python is an extremely dynamic language it is possible
to delete and to generate whole classes at run time, therefore the MRO
is a dynamic concept. For instance, let me show how it is possibile to
remove a class from my
paleoanthropological hierarchy: for instance I can
replace the last class 'HomoSapiensSapiens' with 'HomoSapiensNeardenthalensis'
(changing a class in the middle of the hierarchy would be more difficult). The
following lines do the job dynamically:
>>> from oopp import *
>>> del HomoSapiensSapiens
>>> class HomoSapiensNeardenthalensis(HomoSapiens):
... def can(self):
... super(self.__this,self).can()
... print " - make something"
>>> reflective(HomoSapiensNeardenthalensis)
>>> HomoSapiensNeardenthalensis().can()
HomoSapiensNeardenthalensis can:
- make tools
- make abstractions
- make something
In this case the MRO of 'HomoSapiensNeardenthalensis', i.e. the list of
all its ancestors, is
>>> HomoSapiensNeardenthalensis.__mro__
[,,
, ,
, ]
The ``__mro__`` attribute gives the *linearization* of the class, i.e. the
ordered list of its ancestors, starting from the class itself and ending
with object. The linearization of a class is essential in order to specify
the resolution order of methods and attributes, i.e. the Method Resolution
Order (MRO). In the case of single inheritance hierarchies, such the
paleonthropological example, the MRO is pretty obvious; on the contrary
it is a quite non-trivial concept in the case of multiple inheritance
hierarchies.
For instance, let me reconsider my first example of multiple inheritance,
the ``NonInstantiableClock`` class, inheriting from 'NonInstantiable' and
'Clock'. I may represent the hierarchy with the following inheritance graph:
::
-- object --
/ (__new__) \
/ \
/ \
Clock NonInstantiable
(get_time) (__new__)
\ /
\ /
\ /
\ /
\ /
NonInstantiableClock
(get_time,__new__)
The class ``Clock`` define a ``get_time`` method, whereas the class
``NonInstantiable`` overrides the ``__new__`` method of the ``object`` class;
the class ``NonInstantiableClock`` inherits ``get_time`` from 'Clock' and
``__new__`` from 'NonInstantiable'.
The linearization of 'NonInstantiableClock' is
>>> NonInstantiableClock.mro()
[, ,
, ]
In particular, since 'NonInstantiable' precedes 'object', its ``__new__``
method overrides the ``object`` new method. However, with the MRO used before
Python 2.2, the linearization would have been ``NonInstantiableClock, Clock,
object, NonInstantiable, object`` and the ``__new__`` method of object would
have (hypothetically, of course, since before Python 2.2 there was not
``__new__`` method! ;-) overridden the ``__new__``
method of ``NonInstantiable``, therefore ``NonInstantiableClock`` would
have lost the property of being non-instantiable!
This simple example shows that the choice of a correct Method Resolution
Order is far from being obvious in general multiple inheritance hierarchies.
After a false start in Python 2.2, (with a MRO failing in some subtle cases)
Python 2.3 decided to adopt the so-called C3 MRO, invented by people working
on Dylan (even if Dylan itself uses the MRO of Common Lisp CLOS). Since this
is quite a technical matter, I defer the interested reader to appendix 2
for a full discussion of the C3 algorithm.
Here, I prefer to point out how the built-in
``super`` object works in multiple inheritance situations. To this aim, it
is convenient to define an utility function that retrieves the ancestors
of a given class with respect to the MRO of one of its subclasses:
::
#
def ancestor(C,S=None):
"""Returns the ancestors of the first argument with respect to the
MRO of the second argument. If the second argument is None, then
returns the MRO of the first argument."""
if C is object:
raise TypeError("There is no superclass of object")
elif S is None or S is C:
return list(C.__mro__)
elif issubclass(S,C): # typical case
mro=list(S.__mro__)
return mro[mro.index(C):] # compute the ancestors from the MRO of S
else:
raise TypeError("S must be a subclass of C")
#
Let me show how the function ``ancestor`` works.
Consider the class ``Clock`` in isolation: then
its direct superclass, i.e. the first ancestor, is ``object``,
>>> from oopp import *
>>> ancestor(Clock)[1]
therefore ``super(Clock).__new__`` retrieves the ``object.__new__`` method:
>>> super(Clock).__new__
Consider now the ``Clock`` class together with its subclass
``NonInstantiableClock``:
in this case the first ancestor of ``Clock``, *with respect to the MRO of
'NonInstantiableClock'* is ``NonInstantiable``
>>> ancestor(Clock,NonInstantiableClock)[1]
Therefore ``super(Clock,NonInstantiableClock).__new__`` retrieves the
``NonInstantiable.__new__`` method:
>>> super(Clock,NonInstantiableClock).__new__
>>> NonInstantiable.__new__
It must be pointed out that ``super(C,S)`` is equivalent but not the same
than ``ancestor(C,S)[1]``, since it does not return the superclass:
it returns a super object, instead:
>>> super(Clock,NonInstantiableClock)
, >
#
#class Super(super):
# def __init__(self,C,S=None):
# super(Super,self).__init__(C,S)
# self.__name__="Super(%s)" % C.__name__
#
Finally, there is little quirk of super:
>>> class C(PrettyPrinted): pass
>>> s=super(C,C())
>>> s.__str__()
but
>>> str(s) # idem for print s
", >"
Idem for non-pre-existing methods:
>>> class D(list): pass
...
>>> s=super(D,D())
>>> s.__len__()
0
>>> len(s) #error
Traceback (most recent call last):
File "", line 1, in ?
TypeError: len() of unsized object
The same problem comes with ``__getattr__``:
>>> class E(object):
... def __getattr__(self,name):
... if name=='__len__': return lambda:0
...
>>> e=E()
>>> e.__len__()
0
>>> len(e) # error
Traceback (most recent call last):
File "", line 1, in ?
TypeError: len() of unsized object
Counting instances
----------------------------------------------------------------------
.. line-block::
*Everything should be built top-down, except the first time.*
-- Alan Perlis
Multiple inheritance adds a step further to the bottom-up philosophy and
it makes appealing the idea of creating classes with the only
purpose of being derived. Whereas in the top-down approach one starts
with full featured standalone classes, to be further refined, in the
mix-in approach one starts with bare bone classes, providing very simple
or even trivial features, with the purpose of providing
basic reusable components in multiple inheritance hierarchies.
At the very end, the idea is to generate a library of *mixin* classes, to be
composed with other classes. We already saw a couple of examples of
mixin classes: 'NonInstantiable' and 'Customizable'. In this paragraph
I will show three other examples: 'WithCounter','Singleton' and
'AvoidDuplication'.
A common requirement for a class is the ability to count the number of its
instances. This is a quite easy problem: it is enough to increments a counter
each time an instance of that class is initialized. However, this idea can
be implemented in the wrong way. i.e. naively one could implement
counting capabilities in a class without such capabilities by modifying the
``__init__`` method explicitly in the original source code.
A better alternative is to follow the bottom-up approach and to implement
the counting feature in a separate mix-in class: then the feature can be
added to the original class via multiple inheritance, without touching
the source.
Moreover, the counter class becomes a reusable components that can be
useful for other problems, too. In order to use the mix-in approach, the
``__new__`` method of the counter class must me cooperative, and preferably
via an anonymous super call.
::
#
class WithCounter(object):
"""Mixin class counting the total number of its instances and storing
it in the class attribute counter."""
counter=0 # class attribute (or static attribute in C++/Java terms)
def __new__(cls,*args,**kw):
cls.counter+=1 # increments the class attribute
return super(cls.__this,cls).__new__(cls,*args,**kw)
#anonymous cooperative call to the superclass's method __new__
reflective(WithCounter)
#
Each time an instance of 'WithCounter' is initialized, the counter 'count' is
incremented and when 'WithCounter' is composed trough multiple inheritance,
its '__new__' method cooperatively invokes the ``__new__`` method
of the other components.
For instance, I can use 'WithCounter' to implement a 'Singleton', i.e.
a class that can have only one instance. This kind of classes can be
obtained as follows:
::
#
class Singleton(WithCounter):
"If you inherit from me, you can only have one instance"
def __new__(cls,*args,**kw):
if cls.counter==0: #first call
cls.instance=super(cls.__this,cls).__new__(cls,*args,**kw)
return cls.instance
reflective(Singleton)
#
As an application, I can create a
class ``SingleClock`` that inherits from ``Clock``
*and* from ``Singleton``. This means that ``SingleClock`` is both a
'Clock' and a 'Singleton', i.e. there can be only a clock:
>>> from oopp import Clock,Singleton
>>> class SingleClock(Clock,Singleton): pass
...
>>> clock1=SingleClock()
>>> clock2=SingleClock()
>>> clock1 is clock2
True
Instantiating many clocks is apparently possible (i.e. no error
message is given) but you always obtain the same instance. This makes
sense, since there is only one time on the system and a single
clock is enough.
A variation of the 'Singleton' is a class that generates a new
instance only when a certain condition is satisfied. Suppose for instance
one has a 'Disk' class, to be instantiated with the syntax
``Disk(xpos,ypos,radius)``.
It is clear that two disks with the same radius and the same position in
the cartesian plane, are essentially the same disk (assuming there are no
additional attributes such as the color). Therefore it is a vaste of memory
to instantiate two separate objects to describe the same disk. To solve
this problem, one possibility is to store in a list the calling arguments.
When it is time to instanciate a new objects with arguments args = xpos,ypos,
radius, Python should check if a disk with these arguments has already
been instanciated: in this case that disk should be returned, not a new
one. This logic can be elegantly implemented in a mix-in class such as the
following (compare with the ``withmemory`` wrapper in chapter 2):
::
#
class AvoidDuplication(object):
def __new__(cls,*args,**kw):
return super(cls.__this,cls).__new__(cls,*args,**kw)
__new__=withmemory(__new__) # collects the calls in __new__.result
reflective(AvoidDuplication)
#
Notice that 'AvoidDuplication' is introduced with the only purpose of
giving its functionality to 'Disk': in order to reach this goal, it is enough
to derive 'Disk' from this class and our previously
introduced 'GeometricFigure' class by writing something like
>>> from oopp import *
>>> class Disk(GeometricFigure,AvoidDuplication):
... def __init__(self,xpos,ypos,radius):
... return super(Disk,self).__init__('(x-x0)**2+(y-y0)**2 <= r**2',
... x0=xpos,y0=ypos,r=radius)
Now, if we create a disk
>>> c1=Disk(0,0,10) #creates a disk of radius 10
it is easy enough to check that trying to instantiate a new disk with the
*same* arguments return the old disk:
>>> c2=Disk(0,0,10) #returns the *same* old disk
>>> c1 is c2
True
Here, everything works, because through the
cooperative ``super`` mechanism, ``Disk.__init__`` calls
``AvoidDuplication.__init__`` that calls ``GeometricFigure.__init__``
that in turns initialize the disk. Inverting the order of
'AvoidDuplication' and 'GeometricFigure' would case a disaster, since
``GeometricFigure.__init__`` would override ``AvoidDuplication.__init__``.
Alternatively, one could use the object factory 'Makeobj' implemented in
chapter 3:
>>> class NonDuplicatedFigure(GeometricFigure,AvoidDuplication): pass
>>> makedisk=Makeobj(NonDuplicatedFigure,'(x-x0)**2/4+(y-y0)**2 <= r**2')
>>> disk1=makedisk(x0=38,y0=7,r=5)
>>> disk2=makedisk(x0=38,y0=7,r=5)
>>> disk1 is disk2
True
Remark: it is interesting to notice that the previous approach would not work
for keyword arguments, directly, since dictionary are unhashable.
The pizza-shop example
----------------------------------------------------------------
Now it is time to give a non-trivial example of multiple inheritance with
cooperative and non-cooperative classes. The point is that multiple
inheritance can easily leads to complicated hierarchies: where the
resolution order of methods is far from being obvious and actually
can give bad surprises.
To explain the issue, let me extend the program for the pizza-shop owner of
chapter 4, by following the bottom-up approach and using anonymous
cooperative super calls.
In this approach, one starts from the simplest thing.
It is clear that the pizza-shop owner has interest in recording all the
pizzas he sell.
To this aim, he needs a class providing logging capabilities:
each time a new instance is created, its features are stored in a log file. In
order to count the total number of instances, 'WithLogger' must derive from
the 'WithCounter' class. In order to have a nicely printed message,
'WithLogger' must derive from 'PrettyPrinted'. Finally,
since 'WithLogger' must be a general purpose
class that I will reuse in other problem as a mixin class, it must be
cooperative. 'WithLogger' can be implemented as follows:
::
#
class WithLogger(WithCounter,PrettyPrinted):
"""WithLogger inherits from WithCounter the 'count' class attribute;
moreover it inherits '__str__' from PrettyPrinted"""
logfile=sys.stdout #default
verboselog=False #default
def __init__(self,*args,**kw):
super(self.__this,self).__init__(*args,**kw) # cooperative
dic=attributes(self) # non-special attributes dictionary
print >> self.logfile,'*'*77
print >> self.logfile, time.asctime()
print >> self.logfile, "%s. Created %s" % (type(self).counter,self)
if self.verboselog:
print >> self.logfile,"with accessibile non-special attributes:"
if not dic: print >> self.logfile,"",
else: print >> self.logfile, pretty(dic)
reflective(WithLogger)
#
Here I could well use ``super(self.__this,self).__init__(*args,**kw)``
instead of ``super(self.__this,self).__init__(*args,**kw)``, nevertheless
the standard ``super`` works in this case and I can use it with better
performances.
Thanks to the power of multiple inheritance, we may give logging features
to the 'CustomizablePizza' class defined in chapter 4
with just one line of code:
>>> from oopp import *
>>> class Pizza(WithLogger,CustomizablePizza):
... "Notice, WithLogger is before CustomizablePizza"
>>> Pizza.With(toppinglist=['tomato'])('small')
****************************************************************************
Sat Feb 22 14:54:44 2003
1. Created
<__main__.Pizza object at 0x816927c>
It is also possible to have a more verbose output:
>>> Pizza.With(verboselog=True)
>>> Pizza('large')
****************************************************************************
Sat Feb 22 14:59:51 2003
1. Created
with accessibile non-special attributes:
With = >
baseprice = 1
count = 2
formatstring = %s
logfile = ', mode 'w' at 0x402c2058>
price = >
size = large
sizefactor = {'small': 1, 'large': 3, 'medium': 2}
topping_unit_price = 0.5
toppinglist = ['tomato']
toppings_price = >
verboselog = True
with = >
<__main__.Pizza object at 0x401ce7ac>
However, there is a problem here, since the output is '' and
not the nice 'large pizza with tomato, cost $ 4.5' that we would
expect from a child of 'CustomizablePizza'. The solution to the
puzzle is given by the MRO:
>>> Pizza.mro()
[, ,
, ,
, ,
, ]
The inheritance graph is rather complicated:
::
object 7
/ / \ \
/ / \ \
/ / \ \
/ / \ \
/ / \ \
/ / \ \
/ / \ \
/ / \ \
/ / \ \
2 WithCounter PrettyPrinted 3 GenericPizza 5 Customizable 6
(__new__) (__str__,__init__) (__str__) /
\ / / /
\ / / /
\ / / /
\ / / /
\ / CustomizablePizza 4
\ / /
1 WithLogger /
(__init__) /
\ /
\ /
\ /
\ /
\ /
Pizza O
As we see, the precedence in the resolution of methods is far from being
trivial. It is denoted in the graph with numbers
from 0 to 7: first the methods of 'Pizza' (level 0), then the methods of
'WithLogger' (level 1), then the methods of 'WithCounter' (level 2), then
the methods of 'PrettyPrinted' (level 3), then the methods of
'CustomizablePizza' (level 4), then the methods of 'GenericPizza' (level 5),
then the level of 'Customizable' (level 6), finally the 'object' methods
(level 7).
The reason why the MRO is so, can be understood by studying
appendix 1.
We see that the ``__init__`` methods of 'WithLogger' and
the ``__new__`` method of 'WithCounter' are cooperative.
``WithLogger.__init__``
calls ``WithCounter.__init__`` that is
inherited from ``CustomizablePizza.__init__`` which is not cooperative,
but this is not dangerous since ``CustomizablePizza.__init__`` does not need
to call any other ``__init__``.
However, ``PrettyPrinted.__str__`` and ``GenericPizza.__str__`` are not
cooperative and since 'PrettyPrinted' precedes 'GenericPizza', the
``GenericPizza.__str__`` method is overridden, which is bad.
If ``WithLogger.__init__`` and ``WithCounter.__new__`` were not
cooperative, they would therefore badly breaking the program.
The message is: when you inherit from both cooperative and non-cooperative
classes, put cooperative classes first. The will be fair and will not
blindly override methods of the non-cooperative classes.
With multiple inheritance you can reuse old code a lot,
however the price to pay, is to have a non-trivial hierarchy. If from
the beginning we knew that 'Pizza' was needing a 'WithLogger',
a 'WithCounter' and the
ability to be 'Customizable' we could have put everything in an unique
class. The problem is that in real life one never knows ;)
Fortunately, Python dynamism allows to correct design mistakes
Remark: in all text books about inheritance, the authors always stress
that inheritance should be used as a "is-a" relation, not
and "has-a" relation. In spite of this fact, I have decided to implement
the concept of having a logger (or a counter) via a mixin class. One
should not blindly believe text books ;)
Fixing wrong hierarchies
-----------------------------------------------------------------------------
A typical metaprogramming technique, is the run-time modification of classes.
As I said in a previous chapter, this feature can confuse the programmer and
should not be abused (in particular it should not be used as a replacement
of inheritance!); nevertheless, there applications where the ability of
modifying classes at run time is invaluable: for instance,
it can be used to correct design mistakes.
In this case we would like the ``__str__ method`` of 'PrettyPrinted' to be
overridden by ``GenericPizza.__str__``. Naively, this can be solved by
putting 'WithLogger' after 'GenericPizza'. Unfortunately, doing so
would cause ``GenericPizza.__init__`` to override ``WithLogger.__init__``,
therefore by loosing logging capabilitiesr, unless countermeasures
are taken.
A valid countermeasure could be to replace the non-cooperative
``GenericPizza.__init__`` with a cooperative one. This can miraculously
done at run time in few lines of code:
::
#
def coop_init(self,size): # cooperative __init__ for GenericPizza
self.size=size
super(self._GenericPizza__this,self).__init__(size)
GenericPizza.__init__=coop_init # replace the old __init__
reflective(GenericPizza) # define GenericPizza.__this
#
Notice the usage of the fully qualified private attribute
``self._GenericPizza__this`` inside ``coop_init``: since this function
is defined outside any class, the automatica mangling mechanism cannot
work and has to be implemented by hand. Notice also that
``super(self._GenericPizza__this,self)`` could be replaced by
``super(GenericPizza,self)``; however the simpler approach is
less safe against possible future manipulations of the hierarchy.
Suppose, for example, we want to create a copy of the hierarchy
with the same name but slightly different features (actually,
in chapter 8 we will implement a traced copy of the pizza hierarchy,
useful for debugging purposes): then, using ``super(GenericPizza,self)``
would raise an error, since self would be an instance of the traced
hierarchy and ``GenericPizza`` the original nontraced class. Using
the form ``super(self._GenericPizza__this,self)`` and making
``self._GenericPizza__this`` pointing to the traced 'GenericPizza'
class (actually this will happen automatically) the problems goes
away.
Now everything works if 'WithLogger' is put after 'CustomizablePizza'
>>> from oopp import *
>>> class PizzaWithLog(CustomizablePizza,WithLogger): pass
>>> PizzaWithLog.With(toppinglist=['tomato'])('large')
****************************************************************************
Sun Apr 13 16:19:12 2003
1. Created large pizza with tomato, cost $ 4.5
The log correctly says ``Created large pizza with tomato, cost $ 4.5`` and not
``Created `` as before since now ``GenericPizza.__str__``
overrides ``PrettyPrinted.__str__``. Moreover, the hierarchy is logically
better organized:
>>> PizzaWithLog.mro()
[, ,
, ,
, ,
, ]
I leave as an exercise for the reader to make the ``__str__`` methods
cooperative ;)
Obviously, in this example it would have been better to correct the
original hierarchy, by leaving 'Beautiful' instantiable from the beginning
(that's why I said the 'Beautiful' is an example of wrong mix-in class):
nevertheless, sometimes, one has do to with wrong hierarchies written by
others, and it can be a pain to fix them, both directly by modifying the
original source code, and indirectly
by inheritance, since one must change all the names, in order to distinghish
the original classes from the fixed ones. In those cases Python
dynamism can save your life. This also allows you enhance original
classes which are not wrong, but that simply don't do something you want
to implement.
Modifying classes at run-time can be trivial, as in the examples I have
shown here, but can also be rather tricky, as in this example
>>> from oopp import PrettyPrinted
>>> class PrettyPrintedWouldBe(object): __str__ = PrettyPrinted.__str__
>>> print PrettyPrintedWouldBe() #error
Traceback (most recent call last):
File "", line 1, in ?
TypeError: unbound method __str__() must be called with PrettyPrinted
instance as first argument (got nothing instead)
As the error message says, the problem here, is that the
``PrettyPrinted.__str__`` unbound method, has not received any argument.
This is because in this
form ``PrettyPrintedWouldBe.__str__`` has been defined as an attribute,
not as a real method. The solution is to write
>>> class PrettyPrintedWouldBe(object):
... __str__ = PrettyPrinted.__dict__['__str__']
...
>>> print PrettyPrintedWouldBe() # now it works
This kind of run-time modifications does not work when private variables
are involved:
::
#
class C(object):
__x='C.__init__'
def __init__(self):
print self.__x # okay
class D(object):
__x='D.__init__'
__init__=C.__dict__['__init__'] # error
class New:
class C(object):
__x='New.C.__init__'
__init__=C.__dict__['__init__'] # okay
C()
try: D()
except AttributeError,e: print e
#
Gives as result
::
C.__init__
'D' object has no attribute '_C__x'
New.C.__init__
The problem is that when ``C.__dict__['__init__']`` is compiled
(to byte-code) ``self.__x`` is expanded to ``self._C__x``. However,
when one invokes ``D.__init__``, a D-object is passed, which has
a ``self._D__x`` attribute, but not a ``self._C__x`` attribute (unless
'D' is a subclass of 'C'. Fortunately, Python wisdom
*Namespaces are one honking great idea -- let's do more of those!*
suggests the right solution: to use a new class with the *same name*
of the old one, but in a different namespace, in order to avoid
confusion. The simplest way to generate a new namespace is to
declare a new class (the class 'New' in this example): then 'New.C'
becomes an inner class of 'New'. Since it has the same name of the
original class, private variables are correctly expanded and one
can freely exchange methods from 'C' to 'New.C' (and viceversa, too).
Modifying hierarchies
-------------------------------------------------------------------------
::
def mod(cls): return cls
class New: pass
for c in HomoSapiensSapiens.__mro__:
setattr(New,c.__name__,mod(c))
Inspecting Python code
-------------------------------------------------------------------------
how to inspect a class, by retrieving useful informations about its
information.
A first possibility is to use the standard ``help`` function.
The problem of this approach is that ``help`` gives too much
information.
::
#
#plaindata=
plainmethod=lambda m:m #identity function
class Get(object):
"""Invoked as Get(cls)(xxx) where xxx = staticmethod, classmethod,
property, plainmethod, plaindata, returns the corresponding
attributes as a keyword dictionary. It works by internally calling
the routine inspect.classify_class_attrs. Notice that data
attributes with double underscores are not retrieved
(this is by design)."""
def __init__(self,cls):
self.staticmethods=kwdict()
self.classmethods=kwdict()
self.properties=kwdict()
self.methods=kwdict()
self.data=kwdict()
for name, kind, klass, attr in inspect.classify_class_attrs(cls):
if kind=='static method':
self.staticmethods[name]=attr
elif kind=='class method':
self.classmethods[name]=attr
elif kind=='property':
self.properties[name]=attr
elif kind=='method':
self.methods[name]=attr
elif kind=='data':
if not special(name): self.data[name]=attr
def __call__(self,descr): #could be done with a dict
if descr==staticmethod: return self.staticmethods
elif descr==classmethod: return self.classmethods
elif descr==property: return self.properties
elif descr==plainmethod: return self.methods
elif descr==plaindata: return self.data
else: raise SystemExit("Invalid descriptor")
#
With similar tricks one can automatically recognize cooperative methods:
#it is different, (better NOT to use descriptors)
::
#
#class Cooperative(Class):
# __metaclass__ = WithWrappingCapabilities
#
# def cooperative(method):
# """Calls both the superclass method and the class
# method (if the class has an explicit method).
# Works for methods returning None."""
# name,cls=Cooperative.parameters # fixed by the meta-metaclass
# def _(*args,**kw):
# getattr(super(cls,args[0]),name)(*args[1:],**kw)
# if method: method(*args,**kw) # call it
# return _
#
# cooperative=staticmethod(cooperative)
#
::
#
def wrapH(cls):
for c in cls.__mro__[:-2]:
tracer.namespace=c.__name__
new=vars(c).get('__new__',None)
if new: c.__new__=tracedmethod(new)
#
THE MAGIC OF METACLASSES - PART I
==========================================================================
.. line-block::
*Metaclasses are deeper magic than 99% of users should ever
worry about. If you wonder whether you need them, you don't
(the people who actually need them know with certainty that
they need them, and don't need an explanation about why).*
--Tim Peters
Python always had metaclasses, since they are inherent to its object
model. However, before Python 2.2, metaclasses where tricky and their
study could cause the programmer's brain to explode [#]_. Nowadays,
the situation has changed, and the reader should be able to understand
this chapter without risk for his/her brain (however I do not give any
warranty ;)
Put it shortly, metaclasses give to the Python programmer
complete control on the creation of classes. This simple statement
has far reaching consequences, since the ability of interfering with
the process of class creation, enable the programmer to make miracles.
In this and in the following chapters, I will show some of these
miracles.
This chapter will focus on subtle problems of metaclasses in inheritance
and multiple inheritance, including multiple inheritance of metaclasses
with classes and metaclasses with metaclasses.
The next chapter will focus more on applications.
.. [#] Metaclasses in Python 1.5 [A.k.a the killer joke]
http://www.python.org/doc/essays/metaclasses/
There is very little documentation about metaclasses, except Guido's
essays and the papers by David Mertz and myself published in IBMdeveloperWorks
http://www-106.ibm.com/developerworks/library/l-pymeta.html
Metaclasses as class factories
------------------------------------------------------------------------
In the Python object model (inspired from the Smalltalk, that had metaclasses
a quarter of century ago!) classes themselves are objects.
Now, since objects are instances of classes, that means that classes
themselves can be seen as instances of special classes called *metaclasses*.
Notice that things get hairy soon, since by following this idea, one could
say the metaclasses themselves are classes and therefore objects; that
would mean than even metaclasses can be seen as
instances of special classes called meta-metaclasses. On the other hand,
meta-meta-classes can be seen as instances of meta-meta-metaclasses,
etc. Now, it should be obvious why metaclasses have gained such a
reputation of brain-exploders ;). However, fortunately, the situation
is not so bad in practice, since the infinite recursion of metaclasses is
avoided because there is a metaclass that is the "mother of all metaclasses":
the built-in metaclass *type*. 'type' has the property of being its own
metaclass, therefore the recursion stops. Consider for instance the following
example:
>>> class C(object): pass # a generic class
>>> type(C) #gives the metaclass of C
>>> type(type(C)) #gives the metaclass of type
The recursion stops, since the metaclass of 'type' is 'type'.
One cool consequence of classes being instances of 'type',
is that since *type* is a subclass of object,
>>> issubclass(type,object)
True
any Python class is not only a subclass of ``object``, but also
an instance of 'object':
>>> isinstance(C,type)
True
>>> isinstance(C,object)
True
>>> issubclass(C,object)
True
Notice that 'type' is an instance of itself (!) and therefore of 'object':
>>> isinstance(type,type) # 'type' is an instance of 'type'
True
>>> isinstance(type,object) # therefore 'type' is an instance of 'object'
True
As it is well known, ``type(X)`` returns the type of ``X``; however,
``type`` has also a second form in which it acts as a class factory.
The form is ``type(name,bases,dic)`` where ``name`` is the name of
the new class to be created, bases is the tuple of its bases and dic
is the class dictionary. Let me give a few examples:
>>> C=type('C',(),{})
>>> C
>>> C.__name__
'C'
>>> C.__bases__
(,)
>>> C.__dict__
Notice that since all metaclasses inherits from ``type``, as a consequences
all metaclasses can be used as class factories.
A fairy tale example will help in understanding the concept
and few subtle points on how attributes are transmitted from metaclasses
to their instances.
Let me start by defining a 'Nobility' metaclass :
>>> class Nobility(type): attributes="Power,Richness,Beauty"
instances of 'Nobility' are classes such 'Princes', 'Dukes', 'Barons', etc.
>>> Prince=Nobility("Prince",(),{})
Instances of 'Nobility' inherits its attributes, just as instances of normal
classes inherits the class docstring:
>>> Prince.attributes
'Power,Richness,Beauty'
Nevertheless, 'attributes' will not be retrieved by the ``dir`` function:
>>> print dir(Prince)
['__class__', '__delattr__', '__dict__', '__doc__', '__getattribute__',
'__hash__', '__init__', '__module__', '__new__', '__reduce__', '__repr__',
'__setattr__', '__str__', '__weakref__']
However, this is a limitation of ``dir``, in reality ``Prince.attributes``
is there. On the other hand, the situation is different for a specific
'Prince' object
>>> charles=Prince()
>>> charles.attributes #error
Traceback (most recent call last):
File "", line 1, in ?
AttributeError: 'Prince' object has no attribute 'attributes'
The transmission of metaclass attributes is not transitive:
instances of the metaclass inherits the attributes, but not the instances
of the instances. This behavior is by design and is needed in order to avoid
troubles with special methods. This point will be throughly
explained in the last paragraph. For the moment, I my notice that the
behaviour is reasonable, since the abstract qualities 'Power,Richness,Beauty'
are more qualities of the 'Prince' class than of one specific representative.
They can always be retrieved via the ``__class__`` attribute:
>>> charles.__class__.attributes
'Power,Richness,Beauty'
Le me now define a metaclass 'Froggyness':
>>> class Frogginess(type): attributes="Powerlessness,Poverty,Uglyness"
Instances of 'Frogginess' are classes like 'Frog', 'Toad', etc.
>>> Frog=Frogginess("Frog",(),{})
>>> Frog.attributes
'Powerlessness,Poverty,Uglyness'
However, in Python miracles can happen:
>>> def miracle(Frog): Frog.__class__=Nobility
>>> miracle(Frog); Frog.attributes
'Powerlessness,Richness,Beauty'
In this example a miracle happened on the class 'Frog', by changing its
(meta)class to 'Nobility'; therefore its attributes have changed accordingly.
However, there is subtle point here. Suppose we explicitly specify the 'Frog'
attributes, in such a way that it can be inherited by one of its specific
representative:
>>> Frog.attributes="poor, small, ugly"
>>> jack=Frog(); jack.attributes
'poor, small, ugly'
Then the miracle cannot work:
::
#
class Nobility(type): attributes="Power, Richness, Beauty"
Prince=Nobility("Prince",(),{})
charles=Prince()
class Frogginess(type): attributes="Inpuissance, Poverty, Uglyness"
Frog=Frogginess("Frog",(),{})
Frog.attributes="poor, small, ugly"
jack=Frog()
def miracle(Frog): Frog.__class__=Nobility
miracle(Frog)
print "I am",Frog.attributes,"even if my class is",Frog.__class__
#
Output:
::
I am poor, small, ugly even if my class is
The reason is that Python first looks at specific attributes of an object
(in this case the object is the class 'Frog') an only if they are not found,
it looks at the attributes of its class (here the metaclass 'Nobility').Since
in this example the 'Frog' class has explicit attributes, the
result is ``poor, small, ugly``. If you think a bit, it makes sense.
Remark:
In Python 2.3 there are restrictions when changing the ``__class__``
attribute for classes:
>>> C=type('C',(),{})
>>> C.__class__ = Nobility #error
Traceback (most recent call last):
File "", line 1, in ?
TypeError: __class__ assignment: only for heap types
Here changing ``C.__class__`` is not allowed, since 'C' is an instance
of the built-in metaclass 'type'. This restriction, i.e. the fact that
the built-in metaclass cannot be changed, has been imposed for
security reasons, in order to avoid dirty tricks with the built-in
classes. For instance, if it was possible to change the metaclass
of the 'bool' class, we could arbitrarily change the behavior of
boolean objects. This could led to abuses.
Thanks to this restriction,
the programmer is always sure that built-in classes behaves as documented.
This is also the reason why 'bool' cannot be subclassed:
>>> print bool.__doc__ # in Python 2.2 would give an error
bool(x) -> bool
Returns True when the argument x is true, False otherwise.
The builtins True and False are the only two instances of the class bool.
The class bool is a subclass of the class int, and cannot be subclassed.
In any case, changing the class of a class is not a good idea, since it
does not play well with inheritance, i.e. changing the metaclass of a base
class does not change the metaclass of its children:
>>> class M1(type): f=lambda cls: 'M1.f' #metaclass1
>>> class M2(type): f=lambda cls: 'M2.f' #metaclass2
>>> B=M1('B',(),{}) # B receives M1.f
>>> class C(B): pass #C receives M1.f
>>> B.f()
'M1.f'
B.__class__=M2 #change the metaclass
>>> B.f() #B receives M2.f
'M2.f'
C.f() #however C does *not* receive M2.f
>>> C.f()
'M1.f'
>>> type(B)
>>> type(C)
Metaclasses as class modifiers
----------------------------------------------------------------------
The interpretation of metaclasses in terms of class factories is quite
straightforward and I am sure that any Pythonista will be at home
with the concept. However, metaclasses have such a reputation of black
magic since their typical usage is *not* as class factories, but as
*class modifiers*. This means that metaclasses are typically
used to modify *in fieri* classes. The trouble is that the
modification can be utterly magical.
Here there is another fairy tale example showing the syntax
(via the ``__metaclass__`` hook) and the magic of the game:
::
#
class UglyDuckling(PrettyPrinted):
"A plain, regular class"
formatstring="Not beautiful, I am %s"
class MagicallyTransformed(type):
"Metaclass changing the formatstring of its instances"
def __init__(cls,*args):
cls.formatstring="Very beautiful, since I am %s"
class TransformedUglyDuckling(PrettyPrinted):
"A class metamagically modified"
__metaclass__ = MagicallyTransformed
formatstring="Not beautiful, I am %s" # will be changed
#
>>> from oopp import *
>>> print UglyDuckling()
Not beautiful, I am
In this example, even if in 'TransformedUglyDuckling' we explicitely
set the formatstring to "Not beautiful, I am %s", the metaclass changes
it to "Very beautiful, even if I am %s" and thus
>>> print TransformedUglyDuckling() # gives
Very beautiful, since I am
Notice that the ``__metaclass__`` hook passes to the metaclass
``MagicallyTransformed`` the name, bases and dictionary of the class
being created, i.e. 'TransformedUglyDucking'.
Metaclasses, when used as class modifiers, act *differently*
from functions, when inheritance is
involved. To clarify this subtle point, consider a subclass 'Swan'
of 'UglyDuckling':
>>> from oopp import *
>>> class Swan(UglyDuckling):
... formatstring="Very beautiful, I am %s"
>>> print Swan()
Very beautiful, I am
Now, let me define a simple function acting as a class modifier:
>>> def magicallyTransform(cls):
... "Modifies the class formatstring"
... customize(cls,formatstring="Very beautiful, even if I am %s")
... return cls
The function works:
>>> magicallyTransform(UglyDuckling)
>>> print UglyDuckling()
Very beautiful, even if I am
This approach is destructive, since we cannot have the original
and the transformed class at the same time, and has potentially bad side
effects in the derived classes. Nevertheless, in this case it works
and it is not dangereous for the derived class 'Swan', since 'Swan'
explicitly overrides the 'formatstring' attribute and doesn't care about
the change in 'UglyDuckling.formatstring'. Therefore the output
of
>>> print Swan()
Very beautiful, I am
is still the same as before the action of the function ``magicallyTransform``.
The situation is quite different if we use the 'MagicallyTransformed'
metaclass:
>>> from oopp import *
>>> class Swan(TransformedUglyDuckling):
... formatstring="Very beautiful, I am %s"
>>> print TransformedUglyDuckling()
Very beautiful, since I am
>>> print Swan() # does *not* print "Very beautiful, I am "
Very beautiful, since I am
Therefore, not only the metaclass has magically transformed the
'TransformedUglyDuckling.formatstring', it has also transformed the
'Swan.formatstring'! And that, despite the fact that
'Swan.formatstring' is explicitly set.
The reason for this behaviour is that since 'UglyDuckling' is a base
class with metaclass 'MagicallyTransformed', and since 'Swan' inherits from
'UglyDuckling', then 'Swan' inherits the metaclass 'MagicallyTransformed',
which is automatically called at 'Swan' creation time.
That's the reason why metaclasses are much more magical and much
more dangerous than
functions: functions do not override attributes in the derived classes,
metaclasses do, since they are automagically called at the time of
creation of the subclass. In other words, functions are explicit,
metaclasses are implicit. Nevertheless, this behavior can be pretty
useful in many circumstances, and it is a feature, not a bug. In the
situations where this behavior is not intended, one should use a function,
not a metaclass. In general, metaclasses are better than functions,
since metaclasses are classes and as such they can inherit one from each
other. This means that one can improve a basic metaclass trough
(multiple) inheritance, with *reuse* of code.
A few caveats about the usage of metaclasses
------------------------------------------------------------------------
Let me start with some caveats about the ``__metaclass__`` hook, which
commonly used and quite powerful, but also quite dangereous.
Let's imagine a programmer not
knowing about metaclasses and looking at the 'TransformedUglyDuckling'
code (assuming there are no comments): she would probably think
that "__metaclass__" is some special attribute used for introspection
purposes only, with no other effects, and she would probably expect
the output of the script to be "Not much, I am the class
TransformedUglyDucking" whereas it is exacly the contrary! In other
words, when metaclasses are involved, *what you see, is not what you get*.
The situation is even more implicit when the metaclass is inherited
from some base class, therefore lacking also the visual clue of the hook.
For these reasons, metaclasses are something to be used with great care;
they can easily make your code unreadable and confuse inexpert programmers.
Moreover, it is more difficult to debug programs involving metaclasses, since
methods are magically transformed by routines defined in the metaclass,
and the code you see in the class is *not* what Python sees. I think
the least confusing way of using metaclasses, is to concentrate all
the dynamics on them and to write empty classes except for the
metaclass hook. If you write a class with no methods such as
::
class TransformedUglyDuckling(object):
__metaclass__=MagicallyTransformed
then the only place to look at, is the metaclass. I have found extremely
confusing to have some of the methods defined in the class and some in
the metaclass, especially during debugging.
Another point to make, is that the ``__metaclass__``
hook should not be used to modify pre-existing classes,
since it requires modifying the source code (even if it is enough to
change one line only). Moreover, it is confusing, since adding a
``__metaclass__`` attribute *after* the class creation would not do the job:
>>> from oopp import UglyDuckling, MagicallyTransformed
>>> UglyDuckling.__metaclass__=MagicallyTransformed
>>> print UglyDuckling()
"Not much, I am the class UglyDuckling"
The reason is that we have to think of UglyDuckling as an instance of
``type``, the built-in metaclasses; merely adding a ``__metaclass__``
attribute does not re-initialize the class.
The problem is elegantly solved by avoiding the hook and creating
an enhanced copy of the original class trough ``MagicallyTransformed``
used as a class factory.
>>> name=UglyDuckling.__name__
>>> bases=UglyDuckling.__bases__
>>> dic=UglyDuckling.__dict__.copy()
>>> UglyDuckling=MagicallyTransformed(name,bases,dic)
Notice that I have recreated 'UglyDuckling', giving to the new class
the old identifier.
>>> print UglyDuckling()
Very beautiful, since I am >
The metaclass of this new 'UglyDuckling' has been specified and will
accompanies all future children of 'UglyDuckling':
>>> class Swan(UglyDuckling): pass
...
>>> type(Swan)
Another caveat, is in the overridding of `` __init__`` in the metaclass.
This is quite common in the case of metaclasses called trough the
``__metaclass__`` hook mechanism, since in this case the class
has been already defined (if not created) in the class statement,
and we are interested in initializing it, more than in recreating
it (which is still possible, by the way).
The problem is that overriding ``__init__`` has severe limitations
with respect to overriding ``__new__``,
since the 'name', 'bases' and 'dic' arguments cannot be directly
changed. Let me show an example:
::
#
from oopp import *
class M(type):
"Shows that dic cannot be modified in __init__, only in __new__"
def __init__(cls,name,bases,dic):
name='C name cannot be changed in __init__'
bases='cannot be changed'
dic['changed']=True
class C(object):
__metaclass__=M
changed=False
print C.__name__ # => C
print C.__bases__ # => (,)
print C.changed # => False
#
The output of this script is ``False``: the dictionary cannot be changed in
``__init__`` method. However, replacing ``dic['changed']=True`` with
``cls.changed=True`` would work. Analougously, changing ``cls.__name__``
would work. On the other hand, ``__bases__`` is a read-only attribute and
cannot be changed once the class has been created, therefore there is no
way it can be touched in ``__init__``. However, ``__bases__`` could be
changed in ``__new__`` before the class creation.
Metaclasses and inheritance
-------------------------------------------------------------------------
It is easy to get confused about the difference between a metaclass
and a mix-in class in multiple inheritance, since
both are denoted by adjectives and both share the same idea of
enhancing a hierarchy. Moreover, both mix-in classes and metaclasses
can be inherited in the whole hierarchy.
Nevertheless, they behaves differently
and there are various subtle point to emphasize. We have already
noticed in the first section that attributes of a metaclass
are transmitted to its instances, but not to the instances of the
instances, whereas the normal inheritance is transitive: the
grandfather transmits its attributes to the children and to the grandchild
too. The difference can be represented with the following picture, where
'M' is the metaclass, 'B' a base class, 'C' a children of 'B'
and c an instance of 'C':
::
M (attr) B (attr)
: |
C (attr) C (attr)
: :
c () c (attr)
Notice that here the relation of instantiation is denoted by a dotted line.
This picture is valid when C has metaclass M but not base class, on when C
has base class but not metaclass. However, what happens whrn the class C has
both a metaclass M and a base class B ?
>>> class M(type): a='M.a'
>>> class B(object): a='B.a'
>>> class C(B): __metaclass__=M
>>> c=C()
The situation can be represented by in the following graph,
::
(M.a) M B (B.a)
: /
: /
(?) C
:
:
(?) c
Here the metaclass M and the base class B are fighting one against the other.
Who wins ? C should inherit the attribute 'B.a' from its base B, however,
the metaclass would like to induce an attribute 'M.a'.
The answer is that the inheritance constraint wins on the metaclass contraint:
>>> C.a
'B.a'
>>> c.a
'B.a'
The reason is the same we discussed in the fairy tale example: 'M.a' is
an attribute of the metaclass, if its instance C has already a specified
attributed C.a (in this case specified trough inheritance from B), then
the attribute is not modified. However, one could *force* the modification:
>>> class M(type):
... def __init__(cls,*args): cls.a='M.a'
>>> class C(B): __metaclass__=M
>>> C.a
'M.a'
In this case the metaclass M would win on the base class B. Actually,
this is not surprising, since it is explicit. What could be surprising,
had we not explained why inheritance silently wins, is that
>>> c.a
'B.a'
This explain the behaviour for special methods like
``__new__,__init__,__str__``,
etc. which are defined both in the class and the metaclass with the same
name (in both cases,they are inherited from ``object``).
In the chapter on objects, we learned that the printed representation of
an object can be modified by overring the ``__str__`` methods of its
class. In the same sense, the printed representation of a class can be
modified by overring the ``__str__`` methods of its metaclass. Let me show an
example:
::
#
class Printable(PrettyPrinted,type):
"""Apparently does nothing, but actually makes PrettyPrinted acting as
a metaclass."""
#
Instances of 'Printable' are classes with a nice printable representation:
>>> from oopp import Printable
>>> C=Printable('Classname',(),{})
>>> print C
Classname
However, the internal string representation stays the same:
>>> C # invokes Printable.__repr__
Notice that the name of class 'C' is ``Classname`` and not 'C' !
Consider for instance the following code:
>>> class M(type):
... def __str__(cls):
... return cls.__name__
... def method(cls):
... return cls.__name__
...
>>> class C(object):
... __metaclass__=M
>>> c=C()
In this case the ``__str__`` method in ``M`` cannot override the
``__str__`` method in C, which is inherited from ``object``.
Moreover, if you experiment a little, you will see that
>>> print C # is equivalent to print M.__str__(C)
C
>>> print c # is equivalent to print C.__str__(c)
<__main__.C object at 0x8158f54>
The first ``__str__`` is "attached" to the metaclass and the
second to the class.
Consider now the standard method "method". It is both attached to the
metaclass
>>> print M.method(C)
C
and to the class
>>> print C.method() #in a sense, this is a class method, i.e. it receives
C #the class as first argument
Actually it can be seen as a class method of 'C' (cfr. Guido van Rossum
"Unifying types and classes in Python 2.2". When he discusses
classmethods he says: *"Python also has real metaclasses, and perhaps
methods defined in a metaclass have more right to the name "class method";
but I expect that most programmers won't be using metaclasses"*). Actually,
this is the SmallTalk terminology, Unfortunately, in Python the word
``classmethod`` denotes an attribute descriptor, therefore it is better
to call the methods defined in a metaclass *metamethods*, in order to avoid
any possible confusion.
The difference between ``method`` and ``__str__`` is that you cannot use the
syntax
>>> print C.__str__() #error
TypeError: descriptor '__str__' of 'object' object needs an argument
because of the confusion with the other __str__; you can only use the
syntax
>>> print M.__str__(C)
Suppose now I change C's definition by adding a method called "method":
::
class C(object):
__metaclass__=M
def __str__(self):
return "instance of %s" % self.__class__
def method(self):
return "instance of %s" % self.__class__
If I do so, then there is name clashing and the previously working
statement print C.method() gives now an error:
::
Traceback (most recent call last):
File "", line 24, in ?
TypeError: unbound method method() must be called with C instance as
first argument (got nothing instead)
Conclusion: ``__str__, __new__, __init__`` etc. defined in the metaclass
have name clashing with the standard methods defined in the class, therefore
they must be invoked with the extended syntax (ex. ``M.__str__(C)``),
whereas normal methods in the metaclass with no name clashing with the methods
of the class can be used as class methods (ex. ``C.method()`` instead of
``M.method(C)``).
Metaclass methods are always bound to the metaclass, they bind to the class
(receiving the class as first argument) only if there is no name clashing with
already defined methods in the class. Which is the case for ``__str__``,
``___init__``, etc.
Conflicting metaclasses
----------------------------------------------------------------------------
Consider a class 'A' with metaclass 'M_A' and a class 'B' with
metaclass 'M_B'; suppose I derive 'C' from 'A' and 'B'. The question is:
what is the metaclass of 'C' ? Is it 'M_A' or 'M_B' ?
The correct answer (see "Putting metaclasses to work" for a thought
discussion) is 'M_C', where 'M_C' is a metaclass that inherits from
'M_A' and 'M_B', as in the following graph:
.. figure:: fig1.gif
However, Python is not yet that magic, and it does not automatically create
'M_C'. Instead, it will raise a ``TypeError``, warning the programmer of
the possible confusion:
>>> class M_A(type): pass
>>> class M_B(type): pass
>>> A=M_A('A',(),{})
>>> B=M_B('B',(),{})
>>> class C(A,B): pass #error
Traceback (most recent call last):
File "", line 1, in ?
TypeError: metatype conflict among bases
This is an example where the metaclasses 'M_A' and 'M_B' fight each other
to generate 'C' instead of cooperating. The metatype conflict can be avoided
by assegning the correct metaclass to 'C' by hand:
>>> class C(A,B): __metaclass__=type("M_AM_B",(M_A,M_B),{})
>>> type(C)
In general, a class A(B, C, D , ...) can be generated without conflicts only
if type(A) is a subclass of each of type(B), type(C), ...
In order to avoid conflicts, the following function, that generates
the correct metaclass by looking at the metaclasses of the base
classes, is handy:
::
#
metadic={}
def _generatemetaclass(bases,metas,priority):
trivial=lambda m: sum([issubclass(M,m) for M in metas],m is type)
# hackish!! m is trivial if it is 'type' or, in the case explicit
# metaclasses are given, if it is a superclass of at least one of them
metabs=tuple([mb for mb in map(type,bases) if not trivial(mb)])
metabases=(metabs+metas, metas+metabs)[priority]
if metabases in metadic: # already generated metaclass
return metadic[metabases]
elif not metabases: # trivial metabase
meta=type
elif len(metabases)==1: # single metabase
meta=metabases[0]
else: # multiple metabases
metaname="_"+''.join([m.__name__ for m in metabases])
meta=makecls()(metaname,metabases,{})
return metadic.setdefault(metabases,meta)
#
This function is particularly smart since:
1. Avoid duplications ..
2. Remember its results.
We may generate the child of a tuple of base classes with a given metaclass
and avoiding metatype conflicts thanks to the following ``child`` function:
::
#
def makecls(*metas,**options):
"""Class factory avoiding metatype conflicts. The invocation syntax is
makecls(M1,M2,..,priority=1)(name,bases,dic). If the base classes have
metaclasses conflicting within themselves or with the given metaclasses,
it automatically generates a compatible metaclass and instantiate it.
If priority is True, the given metaclasses have priority over the
bases' metaclasses"""
priority=options.get('priority',False) # default, no priority
return lambda n,b,d: _generatemetaclass(b,metas,priority)(n,b,d)
#
Here is an example of usage:
>>> class C(A,B): __metaclass__=makecls()
>>> print C,type(C)
Notice that the automatically generated metaclass does not pollute the
namespace:
>>> _M_A_M_B #error
Traceback (most recent call last):
File "", line 1, in ?
NameError: name '_M_A_M_B' is not defined
It can only be accessed as ``type(C)``.
Put it shortly, the ``child`` function allows to generate a child from bases
enhanced by different custom metaclasses, by generating under the hood a
compatibile metaclass via multiple inheritance from the original metaclasses.
However, this logic can only work if the original metaclasses are
cooperative, i.e. their methods are written in such a way to avoid
collisions. This can be done by using the cooperative the ``super`` call
mechanism discussed in chapter 4.
Cooperative metaclasses
----------------------------------------------------------------------------
In this section I will discuss how metaclasses can be composed with
classes and with metaclasses, too. Since we will discusss even
complicated hierarchies, it is convenient to have an utility
routine printing the MRO of a given class:
::
#
def MRO(cls):
count=0; out=[]
print "MRO of %s:" % cls.__name__
for c in cls.__mro__:
name=c.__name__
bases=','.join([b.__name__ for b in c.__bases__])
s=" %s - %s(%s)" % (count,name,bases)
if type(c) is not type: s+="[%s]" % type(c).__name__
out.append(s); count+=1
return '\n'.join(out)
#
Notice that ``MRO`` also prints the metaclass' name in square brackets, for
classes enhanced by a non-trivial metaclass.
Consider for instance the following hierarchy:
>>> from oopp import MRO
>>> class B(object): pass
>>> class M(B,type): pass
>>> class C(B): __metaclass__=M
Here 'M' is a metaclass that inherits from 'type' and the base class 'B'
and 'C' is both an instance of 'M' and a child of 'B'. The inheritance
graph can be draw as
::
object
/ \
B type
| \ /
| M
\ :
\ :
C
Suppose now we want to retrieve the ``__new__`` method of B's superclass
with respect to the MRO of C: obviously, this is ``object.__new__``, since
>>> print MRO(C)
MRO of C:
0 - C(B)[M]
1 - B(object)
2 - object()
This allows to create an instance of 'C' in this way:
>>> super(B,C).__new__(C)
<__main__.C object at 0x4018750c>
It is interesting to notice that this would not work in Python 2.2,
due to a bug in the implementation of ``super``, therefore do not
try this trick with older version of Python.
Notice that everything works
only because ``B`` inherits the ``object.__new__`` staticmethod that
is cooperative and it turns out that it calls ``type.__new__``. However,
if I give to 'B' a non-cooperative method
>>> B.__new__=staticmethod(lambda cls,*args: object.__new__(cls))
things do not work:
>>> M('D',(),{}) #error
Traceback (most recent call last):
File "", line 1, in ?
File "", line 1, in
TypeError: object.__new__(M) is not safe, use type.__new__()
A cooperative method would solve the problem:
>>> B.__new__=staticmethod(lambda m,*args: super(B,m).__new__(m,*args))
>>> M('D',(),{}) # calls B.__new__(M,'D',(),{})
Metamethods vs class methods
-------------------------------------------------------------------
Meta-methods, i.e. methods defined in
a metaclass.
Python has already few built-in metamethods: ``.mro()``
and ``__subclass__``. These are methods of the metaclass 'type' and
there of any of its sub-metaclasses.
>>> dir(type)
['__base__', '__bases__', '__basicsize__', '__call__', '__class__',
'__cmp__', '__delattr__', '__dict__', '__dictoffset__', '__doc__',
'__flags__', '__getattribute__', '__hash__', '__init__', '__itemsize__',
'__module__', '__mro__', '__name__', '__new__', '__reduce__', '__repr__',
'__setattr__', '__str__', '__subclasses__', '__weakrefoffset__', 'mro']
>>> print type.mro.__doc__
mro() -> list
return a type's method resolution order
>>> print type.__subclasses__.__doc__
__subclasses__() -> list of immediate subclasses
>>> class A(object): pass
>>> class B(A): pass
>>> B.mro()
[ |