Python Enhancement Proposals

PEP 443 – Single-dispatch generic functions

PEP
443
Title
Single-dispatch generic functions
Author
Łukasz Langa <lukasz at python.org>
Discussions-To
python-dev@python.org
Status
Final
Type
Standards Track
Created
22-May-2013
Post-History
22-May-2013, 25-May-2013, 31-May-2013
Replaces
245, 246, 3124

Contents

Abstract

This PEP proposes a new mechanism in the functools standard library module that provides a simple form of generic programming known as single-dispatch generic functions.

A generic function is composed of multiple functions implementing the same operation for different types. Which implementation should be used during a call is determined by the dispatch algorithm. When the implementation is chosen based on the type of a single argument, this is known as single dispatch.

Rationale and Goals

Python has always provided a variety of built-in and standard-library generic functions, such as len(), iter(), pprint.pprint(), copy.copy(), and most of the functions in the operator module. However, it currently:

  1. does not have a simple or straightforward way for developers to create new generic functions,
  2. does not have a standard way for methods to be added to existing generic functions (i.e., some are added using registration functions, others require defining __special__ methods, possibly by monkeypatching).

In addition, it is currently a common anti-pattern for Python code to inspect the types of received arguments, in order to decide what to do with the objects.

For example, code may wish to accept either an object of some type, or a sequence of objects of that type. Currently, the “obvious way” to do this is by type inspection, but this is brittle and closed to extension.

Abstract Base Classes make it easier to discover present behaviour, but don’t help adding new behaviour. A developer using an already-written library may be unable to change how their objects are treated by such code, especially if the objects they are using were created by a third party.

Therefore, this PEP proposes a uniform API to address dynamic overloading using decorators.

User API

To define a generic function, decorate it with the @singledispatch decorator. Note that the dispatch happens on the type of the first argument. Create your function accordingly:

>>> from functools import singledispatch
>>> @singledispatch
... def fun(arg, verbose=False):
...     if verbose:
...         print("Let me just say,", end=" ")
...     print(arg)

To add overloaded implementations to the function, use the register() attribute of the generic function. This is a decorator, taking a type parameter and decorating a function implementing the operation for that type:

>>> @fun.register(int)
... def _(arg, verbose=False):
...     if verbose:
...         print("Strength in numbers, eh?", end=" ")
...     print(arg)
...
>>> @fun.register(list)
... def _(arg, verbose=False):
...     if verbose:
...         print("Enumerate this:")
...     for i, elem in enumerate(arg):
...         print(i, elem)

To enable registering lambdas and pre-existing functions, the register() attribute can be used in a functional form:

>>> def nothing(arg, verbose=False):
...     print("Nothing.")
...
>>> fun.register(type(None), nothing)

The register() attribute returns the undecorated function. This enables decorator stacking, pickling, as well as creating unit tests for each variant independently:

>>> @fun.register(float)
... @fun.register(Decimal)
... def fun_num(arg, verbose=False):
...     if verbose:
...         print("Half of your number:", end=" ")
...     print(arg / 2)
...
>>> fun_num is fun
False

When called, the generic function dispatches on the type of the first argument:

>>> fun("Hello, world.")
Hello, world.
>>> fun("test.", verbose=True)
Let me just say, test.
>>> fun(42, verbose=True)
Strength in numbers, eh? 42
>>> fun(['spam', 'spam', 'eggs', 'spam'], verbose=True)
Enumerate this:
0 spam
1 spam
2 eggs
3 spam
>>> fun(None)
Nothing.
>>> fun(1.23)
0.615

Where there is no registered implementation for a specific type, its method resolution order is used to find a more generic implementation. The original function decorated with @singledispatch is registered for the base object type, which means it is used if no better implementation is found.

To check which implementation will the generic function choose for a given type, use the dispatch() attribute:

>>> fun.dispatch(float)
<function fun_num at 0x104319058>
>>> fun.dispatch(dict)    # note: default implementation
<function fun at 0x103fe0000>

To access all registered implementations, use the read-only registry attribute:

>>> fun.registry.keys()
dict_keys([<class 'NoneType'>, <class 'int'>, <class 'object'>,
          <class 'decimal.Decimal'>, <class 'list'>,
          <class 'float'>])
>>> fun.registry[float]
<function fun_num at 0x1035a2840>
>>> fun.registry[object]
<function fun at 0x103fe0000>

The proposed API is intentionally limited and opinionated, as to ensure it is easy to explain and use, as well as to maintain consistency with existing members in the functools module.

Implementation Notes

The functionality described in this PEP is already implemented in the pkgutil standard library module as simplegeneric. Because this implementation is mature, the goal is to move it largely as-is. The reference implementation is available on hg.python.org [1].

The dispatch type is specified as a decorator argument. An alternative form using function annotations was considered but its inclusion has been rejected. As of May 2013, this usage pattern is out of scope for the standard library [2], and the best practices for annotation usage are still debated.

Based on the current pkgutil.simplegeneric implementation, and following the convention on registering virtual subclasses on Abstract Base Classes, the dispatch registry will not be thread-safe.

Abstract Base Classes

The pkgutil.simplegeneric implementation relied on several forms of method resolution order (MRO). @singledispatch removes special handling of old-style classes and Zope’s ExtensionClasses. More importantly, it introduces support for Abstract Base Classes (ABC).

When a generic function implementation is registered for an ABC, the dispatch algorithm switches to an extended form of C3 linearization, which includes the relevant ABCs in the MRO of the provided argument. The algorithm inserts ABCs where their functionality is introduced, i.e. issubclass(cls, abc) returns True for the class itself but returns False for all its direct base classes. Implicit ABCs for a given class (either registered or inferred from the presence of a special method like __len__()) are inserted directly after the last ABC explicitly listed in the MRO of said class.

In its most basic form, this linearization returns the MRO for the given type:

>>> _compose_mro(dict, [])
[<class 'dict'>, <class 'object'>]

When the second argument contains ABCs that the specified type is a subclass of, they are inserted in a predictable order:

>>> _compose_mro(dict, [Sized, MutableMapping, str,
...                     Sequence, Iterable])
[<class 'dict'>, <class 'collections.abc.MutableMapping'>,
 <class 'collections.abc.Mapping'>, <class 'collections.abc.Sized'>,
 <class 'collections.abc.Iterable'>, <class 'collections.abc.Container'>,
 <class 'object'>]

While this mode of operation is significantly slower, all dispatch decisions are cached. The cache is invalidated on registering new implementations on the generic function or when user code calls register() on an ABC to implicitly subclass it. In the latter case, it is possible to create a situation with ambiguous dispatch, for instance:

>>> from collections import Iterable, Container
>>> class P:
...     pass
>>> Iterable.register(P)
<class '__main__.P'>
>>> Container.register(P)
<class '__main__.P'>

Faced with ambiguity, @singledispatch refuses the temptation to guess:

>>> @singledispatch
... def g(arg):
...     return "base"
...
>>> g.register(Iterable, lambda arg: "iterable")
<function <lambda> at 0x108b49110>
>>> g.register(Container, lambda arg: "container")
<function <lambda> at 0x108b491c8>
>>> g(P())
Traceback (most recent call last):
...
RuntimeError: Ambiguous dispatch: <class 'collections.abc.Container'>
or <class 'collections.abc.Iterable'>

Note that this exception would not be raised if one or more ABCs had been provided explicitly as base classes during class definition. In this case dispatch happens in the MRO order:

>>> class Ten(Iterable, Container):
...     def __iter__(self):
...         for i in range(10):
...             yield i
...     def __contains__(self, value):
...         return value in range(10)
...
>>> g(Ten())
'iterable'

A similar conflict arises when subclassing an ABC is inferred from the presence of a special method like __len__() or __contains__():

>>> class Q:
...   def __contains__(self, value):
...     return False
...
>>> issubclass(Q, Container)
True
>>> Iterable.register(Q)
>>> g(Q())
Traceback (most recent call last):
...
RuntimeError: Ambiguous dispatch: <class 'collections.abc.Container'>
or <class 'collections.abc.Iterable'>

An early version of the PEP contained a custom approach that was simpler but created a number of edge cases with surprising results [3].

Usage Patterns

This PEP proposes extending behaviour only of functions specifically marked as generic. Just as a base class method may be overridden by a subclass, so too a function may be overloaded to provide custom functionality for a given type.

Universal overloading does not equal arbitrary overloading, in the sense that we need not expect people to randomly redefine the behavior of existing functions in unpredictable ways. To the contrary, generic function usage in actual programs tends to follow very predictable patterns and registered implementations are highly-discoverable in the common case.

If a module is defining a new generic operation, it will usually also define any required implementations for existing types in the same place. Likewise, if a module is defining a new type, then it will usually define implementations there for any generic functions that it knows or cares about. As a result, the vast majority of registered implementations can be found adjacent to either the function being overloaded, or to a newly-defined type for which the implementation is adding support.

It is only in rather infrequent cases that one will have implementations registered in a module that contains neither the function nor the type(s) for which the implementation is added. In the absence of incompetence or deliberate intention to be obscure, the few implementations that are not registered adjacent to the relevant type(s) or function(s), will generally not need to be understood or known about outside the scope where those implementations are defined. (Except in the “support modules” case, where best practice suggests naming them accordingly.)

As mentioned earlier, single-dispatch generics are already prolific throughout the standard library. A clean, standard way of doing them provides a way forward to refactor those custom implementations to use a common one, opening them up for user extensibility at the same time.

Alternative approaches

In PEP 3124 Phillip J. Eby proposes a full-grown solution with overloading based on arbitrary rule sets (with the default implementation dispatching on argument types), as well as interfaces, adaptation and method combining. PEAK-Rules [4] is a reference implementation of the concepts described in PJE’s PEP.

Such a broad approach is inherently complex, which makes reaching a consensus hard. In contrast, this PEP focuses on a single piece of functionality that is simple to reason about. It’s important to note this does not preclude the use of other approaches now or in the future.

In a 2005 article on Artima [5] Guido van Rossum presents a generic function implementation that dispatches on types of all arguments on a function. The same approach was chosen in Andrey Popp’s generic package available on PyPI [6], as well as David Mertz’s gnosis.magic.multimethods [7].

While this seems desirable at first, I agree with Fredrik Lundh’s comment that “if you design APIs with pages of logic just to sort out what code a function should execute, you should probably hand over the API design to someone else”. In other words, the single argument approach proposed in this PEP is not only easier to implement but also clearly communicates that dispatching on a more complex state is an anti-pattern. It also has the virtue of corresponding directly with the familiar method dispatch mechanism in object oriented programming. The only difference is whether the custom implementation is associated more closely with the data (object-oriented methods) or the algorithm (single-dispatch overloading).

PyPy’s RPython offers extendabletype [8], a metaclass which enables classes to be externally extended. In combination with pairtype() and pair() factories, this offers a form of single-dispatch generics.

Acknowledgements

Apart from Phillip J. Eby’s work on PEP 3124 and PEAK-Rules, influences include Paul Moore’s original issue [9] that proposed exposing pkgutil.simplegeneric as part of the functools API, Guido van Rossum’s article on multimethods [5], and discussions with Raymond Hettinger on a general pprint rewrite. Huge thanks to Nick Coghlan for encouraging me to create this PEP and providing initial feedback.

References

[1]
http://hg.python.org/features/pep-443/file/tip/Lib/functools.py#l359
[2]
PEP 8 states in the “Programming Recommendations” section that “the Python standard library will not use function annotations as that would result in a premature commitment to a particular annotation style”.
[3]
http://bugs.python.org/issue18244
[4]
http://peak.telecommunity.com/DevCenter/PEAK_2dRules
[5] (1, 2)
http://www.artima.com/weblogs/viewpost.jsp?thread=101605
[6]
http://pypi.python.org/pypi/generic
[7]
http://gnosis.cx/publish/programming/charming_python_b12.html
[8]
https://bitbucket.org/pypy/pypy/raw/default/rpython/tool/pairtype.py
[9]
http://bugs.python.org/issue5135

Source: https://github.com/python-discord/peps/blob/main/pep-0443.txt

Last modified: 2022-02-27 22:46:36 GMT