summaryrefslogtreecommitdiff
path: root/artima/python/super1.py
blob: b9f2d03565d35fcd0a5d9512b620c9d5e2911784 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
r"""Foreword
----------------------------

I begun programming with Python in 2002, just after the release of
Python 2.2.  That release was a major overhaul of the language:
new-style classes were introduced, the way inheritance worked changed
and the builtin ``super`` was introduced. Therefore, you may correctly
say that I have worked with ``super`` right from the beginning; still, I
never liked it and over the years I have discovered more and more of
its dark corners.

In 2004 I decided to write a comprehensive paper documenting
``super`` pitfalls and traps, with the goal of publishing it on the
Python web site, just as I had published my essay on multiple
inheritance and the `Method Resolution Order`_. With time the paper
grew longer and longer but I never had the feeling that I had covered
everything I needed to say: moreover I have a full time job, so I
never had the time to fully revise the paper as a whole. As a consequence,
four years have passed and the paper is still in draft status. This is
a pity, since it documents issues that people encounter and that
regularly come out on the Python newsgroups and forums.  

Keeping the draft sitting on my hard disk is doing a disservice to the
community. Still, I lack to time to finish it properly. To
come out from the impasse, I decided to split the long paper in a series of
short blog posts, which I do have the time to review properly. Moreover
people are free to post comments and corrections in case I am making
mistakes (speaking about ``super`` this is always possible). Once I
finish the series, I may integrate the corrections, put it together
again and possibly publish it as whole on the Python website.
In other words, in order to finish the task,
I am trying the strategies of *divide et conquer*
and *release early, release often*. We will see how it goes.

Introduction
-------------------------------------------------------------

``super`` is a Python built-in, first introduced in Python 2.2 and
slightly improved and fixed in later versions, which is often
misunderstood by the average Python programmer. One of the reasons for
that is the poor documentation of ``super``: at the time of this
writing (August 2008) the documentation is incomplete and in some parts
misleading and even wrong. For instance, the standard documentation
(even for the new 2.6 version
http://docs.python.org/dev/library/functions.html#super) still says::

  super(type[, object-or-type])
    Return the superclass of type. If the second argument is omitted the 
    super object returned is unbound. If the second argument is an object, 
    isinstance(obj, type) must be true. If the second argument is a type, 
    issubclass(type2, type) must be true. super() only works for new-style 
    classes.

[UPDATE: the final version of Python 2.6 has a better documentation
for ``super``, as a direct consequence of this post ;)].
The first sentence is just plain wrong: ``super`` does not return the 
superclass. There is no such a thing as *the superclass* in a Multiple 
Inheritance (MI) world. Also, the sentence about *unbound* is misleading,
since it may easily lead the programmer to think about bound and unbound
methods, whereas it has nothing to do with that concept. 
IMNSHO ``super`` is one of the most tricky and surprising Python 
constructs, and we absolutely need a document to shed light on its secrets. 
The present paper is a first step in this direction: it aims to tell you 
the *truth* about ``super``. At least the amount of truth
I have discovered with my experimentations, which is certainly
not the whole truth ;)

A fair warning is in order here: this document is aimed at expert
Pythonistas. It assumes you are familiar with `new-style classes`_ and
the `Method Resolution Order`_ (MRO); moreover a good understanding of
descriptors_ would be extremely useful. Some parts also require good
familiarity with metaclasses_. All in all, this paper is not for the
faint of heart ;)

There is no superclass in a MI world
----------------------------------------------------------

Readers familiar will single inheritance languages, such as
Java or Smalltalk, will have a clear concept of superclass
in mind. This concept, however, has *no useful meaning* in Python or in
other multiple inheritance languages. I became convinced of this fact
after a discussion with Bjorn Pettersen and Alex Martelli 
on `comp.lang.python in May 2003`_
(at that time I was mistakenly thinking that one could define a
superclass concept in Python). Consider this example from that
discussion:

 ::

            +-----+
            |  T  |
            |a = 0|
            +-----+
          /         \
         /           \
     +-------+    +-------+
     |   A   |    |   B   | 
     |       |    | a = 2 |
     +-------+    +-------+
         \           /
          \         /
            +-----+
            |  C  |
            +-----+
               :
               :    instantiation
               c

 >>> class T(object):
 ...     a = 0

 >>> class A(T):
 ...     pass

 >>> class B(T):
 ...     a = 2
 
 >>> class C(A,B):
 ...     pass
 
 >>> c = C()

What is the superclass of ``C``? There are two direct superclasses (i.e. bases)
of ``C``: ``A`` and ``B``. ``A`` comes before ``B``, so one would naturally 
think that the superclass of ``C`` is ``A``. However,
``A`` inherits its attribute ``a`` from ``T``
with value ``a=0``: if ``super(C,c)`` was returning 
the superclass of ``C``, then ``super(C,c).a`` would return 0. This
is NOT what happens. Instead, ``super(C,c).a`` walks trought the
method resolution order  of the class of ``c`` (i.e. ``C``) 
and retrieves the attribute from the first class above ``C`` which
defines it. In this example the MRO of ``C`` is ``[C, A, B, T, object]``, so
``B`` is the first class above ``C`` which defines ``a`` and ``super(C,c).a``
correctly returns the value 2, not 0:

 >>> super(C,c).a
 2

You may call ``A`` the superclass of ``C``, but this is not a useful
concept since the methods are resolved by looking at the classes
in the MRO of ``C``, and not by looking at the classes in the MRO of ``A``
(which in this case is ``[A,T, object]`` and does not contain ``B``). 
The whole MRO is needed, not just the first superclass.

So, using the word *superclass* in the standard docs is
misleading and should be avoided altogether.

Bound and unbound (super) methods
----------------------------------------------------------------

Having established that ``super`` cannot return the
mythical superclass, we may ask ourselves what the hell it is returning 
;) The truth is that ``super`` returns proxy objects.

Informally speaking, a proxy is an object with
the ability to dispatch to methods of other objects via delegation.
Technically, ``super`` is a class overriding the ``__getattribute__`` 
method. Instances of ``super`` are proxy objects providing 
access to the methods in the MRO. The dispatch is done in such a way
that

``super(cls, instance-or-subclass).method(*args, **kw)``

corresponds more or less to

``right-method-in-the-MRO-applied-to(instance-or-subclass, *args, **kw)``

There is a caveat at this point: the second argument can be
an instance of the first argument, or a subclass of it.
In the first case we expect a *bound* method to be returned
and in the second case and *unbound* method to be returned.
This is true in recent versions of Python: for instance, in this example

 >>> class B(object):
 ...     def __repr__(self):
 ...         return "<instance of %s>" % self.__class__.__name__

 >>> class C(B):
 ...     pass

 >>> class D(C):
 ...     pass

 >>> d = D()

you get

 >>> print super(C, d).__repr__
 <bound method D.__repr__ of <instance of D>>

and 

 >>> print super(C, D).__repr__
 <unbound method D.__repr__>

However, if you are still using Python 2.2 (there are unlucky people forced
to use old versions) your should be aware that ``super`` had a bug
and ``super(<class>, <subclass>).method`` returned a *bound* method,
not an unbound one::

 >> print super(C, D).__repr__ # in Python 2.2
 <bound method D.__repr__ of <class '__main__.D'>>

That means that in Python 2.2 you get::

 >> print super(C, D).__repr__() # in Python 2.2
 <instance of type>

``D``, seen as an instance of the (meta)class ``type``, is being passed as
first argument to ``__repr__``. 
This has been fixed in Python 2.3+, where you correctly get
a ``TypeError``:

 >>> print super(C, D).__repr__() # the same as B.__repr__()
 Traceback (most recent call last):
  ...
 TypeError: unbound method __repr__() must be called with D instance as first argument (got nothing instead)

The point is subtle, but usually one does not see problems since typically
``super`` is invoked on instances, not on subclasses, and in this case it
works correctly in all Python versions:

 >>> print super(C, d).__repr__()
 <instance of D>

When I was using Python 2.2, due to the bug just discussed, and due to
the ``super`` docstring

>>> print super.__doc__
super(type) -> unbound super object
super(type, obj) -> bound super object; requires isinstance(obj, type)
super(type, type2) -> bound super object; requires issubclass(type2, type)
Typical use to call a cooperative superclass method:
class C(B):
    def meth(self, arg):
        super(C, self).meth(arg)

I got the impression that in order to get unbound methods I needed to use
the unbound ``super`` object. This is actually untrue. To understand how 
bound/unbound methods work we need to talk about descriptors.

``super`` and descriptors
----------------------------------------------------

Descriptors (more properly I should speak of the descriptor protocol) were 
introduced in Python 2.2 by Guido van Rossum. Their primary motivation 
was technical, since they were needed to implement the new-style object 
system. Descriptors were also used to introduce new standard concepts in 
Python, such as classmethods, staticmethods and properties. Moreover, 
according to the traditional transparency policy of Python, descriptors 
were exposed to the application programmer, giving him/her the freedom
to write custom descriptors.  Any serious Python programmer should have 
a look at descriptors: luckily they are now very well documented (which was
not the case when I first studied them :-/) thanks to the `beautiful essay`_
of Raimond Hettinger. You should read it before continuing this article, 
since it explains all the details. However, for the sake of our discussion
of ``super``, it is enough to say that a *descriptor class* is just a
regular new-style class which implements a ``.__get__`` method with
signature ``__get__(self, obj, objtyp=None)``. A *descriptor object*
is just an instance of a descriptor class. 

Descriptor objects are intended to be used as attributes (hence their
complete name attribute descriptors). Suppose that ``descr`` is a
given descriptor object used as attribute of a given class C.
Then the syntax ``C.descr`` is actually interpreted by Python as a 
call to ``descr.__get__(None, C)``, whereas the same syntax for an 
instance of C corresponds to a call to ``descr.__get__(c, type(c))``.

Since the combination of descriptors and super is so tricky, the core
developers got it wrong in different versions of Python.  For
instance, in Python 2.2 the only way to get the unboud method
``__repr__`` is via the descriptor API::

 >> super(C, d).__repr__.__get__(None, D) # Python 2.2
 <unbound method D.__repr__>

You may check that it works correctly::

 >> print _(d)
 <instance of D>

In Python 2.3 one can get the unbond method by using the ``super(cls, subcls)``
syntax, but the syntax ``super(C, d).__repr__.__get__(None, D)`` also
works; in Python 2.4+ instead the same syntax returns a *bound* method,
not an unbound one:

>>> super(C, d).__repr__.__get__(None, D) # in Python 2.4+
<bound method D.__repr__ of <instance of D>>

The core developers changed the behavior again, making
my life difficult while I was writing this paper :-/
I cannot trace the history of the bugs of ``super`` here, but if you
are using an old version of Python and you find something weird with
``super``, I advice you to have a look at the Python bug tracker
before thinking you are doing something wrong.
In this case, to be correct, the change is not in ``super``, but in the
descriptor implementation. In Python 2.2-2.3 you could
get an unbound method from a bound one as follows::

 >> d.__repr__.__get__(None, D) # in Python 2.2-2.3 
 <unbound method D.__repr__>

In Python 2.4 that does not work anymore:

>>> d.__repr__.__get__(None, D) # in Python 2.4+ 
<bound method D.__repr__ of <instance of D>>

Still, you can get the unbound method by passing for the underlying
function first:

>>> d.__repr__.im_func.__get__(None, D) # in Python 2.4+ 
<unbound method D.__repr__>

.. _Method Resolution Order: http://www.python.org/download/releases/2.3/mro/
.. _new-style classes: http://www.python.org/download/releases/2.2.3/descrintro/
.. _descriptors: http://users.rcn.com/python/download/Descriptor.htm
.. _metaclasses: http://www.ibm.com/developerworks/library/l-pymeta.html
.. _comp.lang.python in May 2003: http://tinyurl.com/5ms8lk
.. _beautiful essay: http://users.rcn.com/python/download/Descriptor.htm
"""

if __name__ == '__main__':
    import doctest; doctest.testmod()