summaryrefslogtreecommitdiff
path: root/artima/python/super4.py
blob: 024fcf23983783821bf53d1bc3cf91a672a6ccf4 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
"""\
Most languages supporting inheritance support cooperative inheritance,
i.e.  there is a language-supported way for children methods
to dispatch to their parent method: this is usually done via a
``super`` keyword.  Things are easy when the language support single
inheritance only, since each class has a single parent and there is an
unique concept of super method. Things are difficult when the
language support multiple inheritance: in that case the programmer
has to understand the intricacies of the Method Resolution Order.

Why cooperative hierarchies are tricky
--------------------------------------------

This paper is intended to be very practical, so let me start with an example.
Consider the following hierarchy (in Python 3):

$$A
$$B
$$C

What is the "superclass" of ``A``? In other words, when I create an
instance of ``A``, which method will be called by
``super().__init__()``?  Notice that I am considering here generic
instances of ``A``, not only direct instances: in particular, an
instance of ``C`` is also an instance of ``A`` and instantiating ``C``
will call ``super().__init__()`` in ``A.__init__`` at some point: the
tricky point is to understand which method will be called
for *indirect* instances of ``A``.

In a single inheritance language there is an unique answer both for
direct and indirect instances (``object`` is the super class of ``A``
and ``object.__init__`` is the method called by
``super().__init__()``).  On the other hand, in a multiple inheritance
language there is no easy answer. It is better to say that there is no
super class and it is impossible to know which method will be called
by ``super().__init__()`` unless the subclass from wich ``super`` is 
called is known. In particular, this is what happens when we instantiate ``C``:

>>> c = C()
C.__init__
A.__init__
B.__init__

As you see the super call  in ``C`` dispatches to ``A.__init__`` and then
the super call there dispatches to  ``B.__init__`` which in turns dispatches to
``object.__init__``. The important point is that
*the same super call can dispatch to different methods*: 
when ``super().__init__()`` is called directly by instantiating 
``A`` it dispatches to ``object.__init__`` whereas when it is called indirectly
by instantiating ``C`` it dispatches to ``B.__init__``. If somebody 
extends the hierarchy, adds subclasses of ``A`` and instantiated them, 
then the super call in ``A.__init__``
can dispatch to an entirely different method: the super method call 
depends on the instance I am starting from. The precise algorithm 
specifying the order in which the methods are called is
called the Method Resolution Order algorithm, or MRO for short. It 
is discussed in detail in an old essay I wrote years ago and 
interested readers are referred to it (see the references below).
Here I will take the easy way and I will ask Python.

Given any class, it is possibly to extract its linearization, i.e. the
ordered list of its ancestors plus the class itself: the super call
follow such list to decide which is the right method to dispatch
to. For instance, if you are considering a direct instance of ``A``,
``object`` is the only class the super call can dispatch to:

.. code-block:: python

 >>> A.mro()
 [<class '__main__.A'>, <class 'object'>]

If you are considering a direct instance of ``C``, ``super`` looks at the
linearization of ``C``:

.. code-block:: python

 >>> C.mro()
 [<class '__main__.C'>, <class '__main__.A'>, <class '__main__.B'>, <class 'object'>]

A super call in ``C`` will look first at ``A``, then at ``B`` and finally at
``object``. Finding out the linearization is non-trivial; just to give
an example suppose we add to our hierarchy three classes ``D``, ``E`` and ``F``
in this way:

.. code-block:: python

 >>> class D: pass
 >>> class E(A, D): pass
 >>> class F(E, C): pass
 >>> for c in F.mro():
 ...    print(c.__name__)
 F
 E
 C
 A
 D
 B
 object

As you see, for an instance of ``F`` a super call in ``A.__init__`` 
will dispatch at ``D.__init__`` and not directly at ``B.__init__``! 

The problem with incompatible signatures
----------------------------------------------------

I have just shown that one cannot tell in advance
where the supercall will dispatch, unless one knows the whole hierarchy:
this is quite different from the single inheritance situation and it is
also very much error prone and brittle. 
When you design a hierarchy you will expect for instance that
``A.__init__`` will call ``B.__init__``, but adding classes (and such
classes may be added by a third party) may change the method chain. In this
case ``A.__init__`` (when invoked by an ``F`` instance) will call
``D.__init__``. This is dangerous: for instance, 
if the behavior of your code depends on the ordering of the
methods you may get in trouble. Things are worse if one of the methods
in the cooperative chain does not have a compatible signature, since the
chain will break.

This problem is not theoretical and it happens even in very trivial
hierarchies.  For instance, here is an example of incompatible
signatures in the ``__init__`` method (this problem 
affects even Python 2.6, not only Python 3.X):

.. code-block:: python

 class X(object):
    def __init__(self, a):
        super().__init__()

 class Y(object):
    def __init__(self, a):
        super().__init__()

 class Z(X, Y):
    def __init__(self, a):
        super().__init__(a)

Here instantiating ``X`` and ``Y`` works fine, but as soon as you
introduce ``Z`` you get in trouble since ``super().__init__(a)`` in
``Z.__init__`` will call ``super().__init__()`` in ``X`` which in
turns will call ``Y.__init__`` with no arguments, resulting in a
``TypeError``!  In older Python versions (from 2.2 to 2.5) such
problem can be avoided by leveraging on the fact that
``object.__init__`` accepts any number of arguments (ignoring them), by
replacing ``super().__init__()`` with ``super().__init__(a)``. In Python
2.6+ instead there is no real solution for this problem, except avoiding
``super`` in the constructor or avoiding multiple inheritance.

In general if you want to support multiple inheritance you should use 
``super`` only when the methods in a cooperative chain 
have consistent signature: that means that you
will not use super in ``__init__`` and ``__new__`` since likely your
constructors will have custom arguments whereas ``object.__init__``
and ``object.__new__`` have no arguments.  However, in practice, you may
inherits from third party classes which do not obey this rule, or
others could derive from your classes without following this rule and
breakage may occur. For instance, I have used ``super`` for years in my
``__init__`` methods and I never had problems because in older Python
versions ``object.__init__`` accepted any number of arguments: but in Python 3
all that code is fragile under multiple inheritance. I am left with 
two choices: removing ``super`` or telling people that
those classes are not intended to be used in multiple inheritance
situations, i.e. the constructors will break if they do that.
Nowadays I tend to favor the second choice. 

Luckily, usually multiple inheritance is used with mixin classes, and mixins do
not have constructors, so that in practice the problem is mitigated.

The intended usage for super
----------------------------------------------------

Even if ``super`` has its shortcomings, there are meaningful use cases for
it, assuming you think multiple inheritance is a legitimate design technique.
For instance, if you use metaclasses and you want to support multiple 
inheritance, you *must* use ``super`` in the ``__new__`` and ``__init__``
methods: there is no problem, since the constructor for 
metaclasses has a fixed signature *(name, bases, dictionary)*. But metaclasses
are extremely rare, so let me give a more meaningful example for an application
programmer where a design bases on cooperative
multiple inheritances could be reasonable.

Suppose you have a bunch of ``Manager`` classes which
share many common methods and which are intended to manage different resources,
such as databases, FTP sites, etc. To be concrete, suppose there are
two common methods: ``getinfolist`` which returns a list of strings
describing the managed resorce (containing infos such as the URI, the
tables in the database or the files in the site, etc.) and ``close``
which closes the resource (the database connection or the FTP connection).
You can model the hierarchy with a ``Manager`` abstract base class

$$Manager

and two concrete classes ``DbManager`` and ``FtpManager``:

$$DbManager
$$FtpManager

Now suppose you need to manage both a database and an FTP site:
then you can define a ``MultiManager`` as follows:

$$MultiManager

Everything works: calling ``MultiManager.close`` will in turn call 
``DbManager.close`` and ``FtpManager.close``. There is no risk of
running in trouble with the signature since the ``close`` and ``getinfolist``
methods have all the same signature (actually they take no arguments at all).
Notice also that I did not use ``super`` in the constructor. 
You see that ``super`` is *essential* in this design: without it, 
only ``DbManager.close`` would be called and your FTP connection would leak. 
The ``getinfolist`` method works similarly: forgetting ``super`` would
mean losing some information. An alternative not using ``super`` would require
defining an explicit method ``close`` in the ``MultiManager``, calling
``DbManager.close`` and ``FtpManager.close`` explicitly, and an explicit
method ``getinfolist`` calling ```DbManager.getinfolist`` and 
``FtpManager.getinfolist``:

$$close
$$getinfolist

This would be less elegant but probably clearer and safer so you can always
decide not to use ``super`` if you really hate it. However, if you have
``N`` common methods, there is some boiler plate to write; moreover, every time
you add a ``Manager`` class you must add it to the ``N`` common methods, which
is ugly. Here ``N`` is just 2, so not using ``super`` may work well, 
but in general it is clear that the cooperative approach is more effective.
Actually, I strongly believe (and always had) that ``super`` and the
MRO are the *right* way to do multiple inheritance: but I also believe
that multiple inheritance itself is *wrong*. For instance, in the
``MultiManager`` example I would not use multiple
inheritance but composition and I would probably use a generalization
such as the following:

$$MyMultiManager

There are languages that do not provide inheritance (even single
inheritance!)  and are perfectly fine, so you should always question
if you should use inheritance or not. There are always many options
and the design space is rather large.  Personally, I always use
``super`` but I use single-inheritance only, so that my cooperative
hierarchies are trivial.

The magic of super in Python 3
----------------------------------------------------------------------

Deep down, ``super`` in Python 3 is the same as in Python 2.X.
However, on the surface - at the syntactic level, not at the semantic level -
there is a big difference: Python 3 super is smart enough to figure out
*the class it is invoked from and the first argument of the containing
method*. Actually it is so smart that it works also for inner classes
and even if the first argument is not called ``self``.
In Python 2.X ``super`` is dumber and you must tell the class and the
argument explicitly: for instance our first example must be written

.. code-block:: python

 class A(object):
     def __init__(self):
         print('A.__init__')
         super(A, self).__init__()

By the way, this syntax works both in Python 3 *and* in Python 2, this is
why I said that deep down ``super`` is the same. The new feature in
Python 3 is that there is a shortcut notation ``super()`` for 
``super(A, self)``. In Python 3 the (bytecode) compiler is smart enough
to recognize that the supercall is performed inside the class ``A`` so
that it inserts the reference to ``A`` automagically; moreover it inserts
the reference to the first argument of the current method too. Typically
the first argument of the current method is ``self``, but it may be
``cls`` or any identifier: ``super`` will work fine in any case. 

Since ``super()`` knows the class it is invoked from and the class of
the original caller, it can walk the MRO correctly. Such information
is stored in the attributes ``.__thisclass__`` and ``.__self_class__``
and you may understand how it works from the following example:

$$Mother
$$Child

.. code-block:: python

 >>> child = Child()
 <class '__main__.Mother'>
 <class '__main__.Child'>

Here ``.__self__class__`` is just the class of the first argument (``self``)
but this is not always the case. The exception is the case of classmethods and
staticmethods taking a class as first argument, such as ``__new__``.
Specifically, ``super(cls, x)`` checks if ``x`` is an instance
of ``cls`` and then sets ``.__self_class__`` to ``x.__class__``; otherwise
(and that happens for classmethods and for ``__new__``) it checks if ``x`` 
is a subclass of ``cls`` and then sets  ``.__self_class__`` to ``x`` directly.
For instance, in the following example

$$C0
$$C1
$$C2

the attribute ``.__self_class__`` is *not* the class of the first argument
(which would be ``type`` the metaclass of all classes) but simply the first
argument:

.. code-block:: python

 >>> C2.c()
 __thisclass__ <class '__main__.C1'>
 __selfclass__ <class '__main__.C2'>
 called classmethod C0.c

There is a lot of magic going on in Python 3 ``super``, and even more.
For instance, this is a syntax that cannot work:

$$super_external

If you try to run this code you will get a
``SystemError: super(): __class__ cell not found`` and the reason is
obvious: since the ``__init__`` method is external to the class the
compiler cannot infer to which class it will be attached at runtime.
On the other hand, if you are completely explicit and you use the full
syntax, by writing the external method as

$$__init__

everything will work because you are explicitly telling than the method
will be attached to the class ``C``.

I will close this section by noticing a wart of ``super`` in Python 3,
pointed out by `Armin Ronacher`_ and others: the fact that ``super``
should be a keyword but it is not. Therefore horrors like the
following are possible:

$$super_horrors

DON'T DO THAT! Here the called ``__init__`` is the ``__init__`` method
of the object ``None``!!  

Of course, only an evil programmer would shadow ``super`` on purpose,
but that may happen accidentally. Consider for instance this use case:
you are refactoring an old code base written before the existence of
``super`` and using ``from mod import *`` (this is ugly but we know
that there are code bases written this way), with ``mod`` defining a
function ``super`` which has nothing to do with the ``super``
builtin. If in this code you replace ``Base.method(self, *args)`` with
``super().method(*args)`` you will introduce a bug. This is not common
(it never happened to me), but still it is bug that could not happen if
``super`` were a keyword.

Moreover, ``super`` is special and it will not work if
you change its name as in this example:

.. code-block:: python

 # from http://lucumr.pocoo.org/2010/1/7/pros-and-cons-about-python-3
 _super = super
 class Foo(Bar):
     def foo(self):
         _super().foo()

Here the bytecode compiler will not treat specially ``_super``, only
``super``. It is unfortunate that we missed the opportunity to make ``super``
a keyword in Python 3, without good reasons (Python 3 was expected 
to break compatibility anyway).

References
---------------------------------------

There is plenty of material about super and multiple inheritance. You
should probably start from the `MRO paper`_, then read `Super
considered harmful`_ by James Knight. A lot of the issues with
``super``, especially in old versions of Python are covered in `Things
to know about super`_. I did spent some time thinking about ways to
avoid multiple inheritance; you may be interested in reading my series
`Mixins considered harmful`_.

.. _MRO paper: http://www.python.org/download/releases/2.3/mro/
.. _new style classes: http://www.python.org/download/releases/2.2.3/descrintro/
.. _Super considered harmful: http://fuhm.net/super-harmful/
.. _Menno Smits: http://freshfoo.com/blog/object__init__takes_no_parameters
.. _Things to know about super: http://www.phyast.pitt.edu/~micheles/python/super.pdf
.. _Mixins considered harmful: http://www.artima.com/weblogs/viewpost.jsp?thread=246341
.. _Armin Ronacher: http://lucumr.pocoo.org/2008/4/30/how-super-in-python3-works-and-why-its-retarded
.. _warts in Python 3: http://lucumr.pocoo.org/2010/1/7/pros-and-cons-about-python-3
"""

import super_external, super_horrors

class A(object):
    def __init__(self):
        print('A.__init__')
        super().__init__()

class B(object):
    def __init__(self):
        print('B.__init__')
        super().__init__()

class C(A, B):
    def __init__(self):
        print('C.__init__')
        super().__init__()

class Manager(object):
    def close(self):
        pass
    def getinfolist(self):
        return []

class DbManager(Manager):
    def __init__(self, dsn):
        self.conn = DBConn(dsn)
    def close(self):
        super().close()
        self.conn.close()
    def getinfolist(self):
        return super().getinfolist() + ['db info']

class FtpManager(Manager):
    def __init__(self, url):
        self.ftp = FtpSite(url)
    def close(self):
        super().close()
        self.ftp.close()
    def getinfolist(self):
        return super().getinfolist() + ['ftp info']


class MultiManager(DbManager, FtpManager):
    def __init__(self, dsn, url):
        DbManager.__init__(dsn)
        FtpManager.__init__(url)

def close(self):
    DbManager.close(self)
    FtpManager.close(self)

def getinfolist(self):
    return DbManager.getinfolist(self) + FtpManager.getinfolist(self)

class MyMultiManager(Manager):
    def __init__(self, *managers):
        self.managers = managers
    def close(self):
        for mngr in self.managers:
            mngr.close()
    def getinfolist(self):
        return sum(mngr.getinfolist() for mngr in self.managers)

class Mother(object):
    def __init__(self):
        sup = super()
        print(sup.__thisclass__)
        print(sup.__self_class__)
        sup.__init__()

class Child(Mother):
    pass

class C0(object):
    @classmethod
    def c(cls):
        print('called classmethod C0.c')

class C1(C0):
    @classmethod
    def c(cls):
        sup = super()
        print('__thisclass__', sup.__thisclass__)
        print('__selfclass__', sup.__self_class__)
        sup.c()

class C2(C1):
    pass

def __init__(self):
    print('calling __init__')
    super(C, self).__init__()

if __name__ == '__main__':
    import doctest; doctest.testmod()