PEP 252 – Making Types Look More Like Classes
- PEP
- 252
- Title
- Making Types Look More Like Classes
- Author
- guido at python.org (Guido van Rossum)
- Status
- Final
- Type
- Standards Track
- Created
- 19-Apr-2001
- Python-Version
- 2.2
- Post-History
Abstract
This PEP proposes changes to the introspection API for types that
makes them look more like classes, and their instances more like
class instances. For example, type(x)
will be equivalent to
x.__class__
for most built-in types. When C is x.__class__
,
x.meth(a)
will generally be equivalent to C.meth(x, a)
, and
C.__dict__
contains x’s methods and other attributes.
This PEP also introduces a new approach to specifying attributes, using attribute descriptors, or descriptors for short. Descriptors unify and generalize several different common mechanisms used for describing attributes: a descriptor can describe a method, a typed field in the object structure, or a generalized attribute represented by getter and setter functions.
Based on the generalized descriptor API, this PEP also introduces a way to declare class methods and static methods.
[Editor’s note: the ideas described in this PEP have been incorporated into Python. The PEP no longer accurately describes the implementation.]
Introduction
One of Python’s oldest language warts is the difference between classes and types. For example, you can’t directly subclass the dictionary type, and the introspection interface for finding out what methods and instance variables an object has is different for types and for classes.
Healing the class/type split is a big effort, because it affects many aspects of how Python is implemented. This PEP concerns itself with making the introspection API for types look the same as that for classes. Other PEPs will propose making classes look more like types, and subclassing from built-in types; these topics are not on the table for this PEP.
Introspection APIs
Introspection concerns itself with finding out what attributes an object has. Python’s very general getattr/setattr API makes it impossible to guarantee that there always is a way to get a list of all attributes supported by a specific object, but in practice two conventions have appeared that together work for almost all objects. I’ll call them the class-based introspection API and the type-based introspection API; class API and type API for short.
The class-based introspection API is used primarily for class
instances; it is also used by Jim Fulton’s ExtensionClasses. It
assumes that all data attributes of an object x are stored in the
dictionary x.__dict__
, and that all methods and class variables
can be found by inspection of x’s class, written as x.__class__
.
Classes have a __dict__
attribute, which yields a dictionary
containing methods and class variables defined by the class
itself, and a __bases__
attribute, which is a tuple of base
classes that must be inspected recursively. Some assumptions here
are:
- attributes defined in the instance dict override attributes defined by the object’s class;
- attributes defined in a derived class override attributes defined in a base class;
- attributes in an earlier base class (meaning occurring earlier
in
__bases__
) override attributes in a later base class.
(The last two rules together are often summarized as the left-to-right, depth-first rule for attribute search. This is the classic Python attribute lookup rule. Note that PEP 253 will propose to change the attribute lookup order, and if accepted, this PEP will follow suit.)
The type-based introspection API is supported in one form or
another by most built-in objects. It uses two special attributes,
__members__
and __methods__
. The __methods__
attribute, if
present, is a list of method names supported by the object. The
__members__
attribute, if present, is a list of data attribute
names supported by the object.
The type API is sometimes combined with a __dict__
that works the
same as for instances (for example for function objects in
Python 2.1, f.__dict__
contains f’s dynamic attributes, while
f.__members__
lists the names of f’s statically defined
attributes).
Some caution must be exercised: some objects don’t list their
“intrinsic” attributes (like __dict__
and __doc__
) in __members__
,
while others do; sometimes attribute names occur both in
__members__
or __methods__
and as keys in __dict__
, in which case
it’s anybody’s guess whether the value found in __dict__
is used
or not.
The type API has never been carefully specified. It is part of
Python folklore, and most third party extensions support it
because they follow examples that support it. Also, any type that
uses Py_FindMethod()
and/or PyMember_Get()
in its tp_getattr
handler supports it, because these two functions special-case the
attribute names __methods__
and __members__
, respectively.
Jim Fulton’s ExtensionClasses ignore the type API, and instead emulate the class API, which is more powerful. In this PEP, I propose to phase out the type API in favor of supporting the class API for all types.
One argument in favor of the class API is that it doesn’t require you to create an instance in order to find out which attributes a type supports; this in turn is useful for documentation processors. For example, the socket module exports the SocketType object, but this currently doesn’t tell us what methods are defined on socket objects. Using the class API, SocketType would show exactly what the methods for socket objects are, and we can even extract their docstrings, without creating a socket. (Since this is a C extension module, the source-scanning approach to docstring extraction isn’t feasible in this case.)
Specification of the class-based introspection API
Objects may have two kinds of attributes: static and dynamic. The
names and sometimes other properties of static attributes are
knowable by inspection of the object’s type or class, which is
accessible through obj.__class__
or type(obj)
. (I’m using type
and class interchangeably; a clumsy but descriptive term that fits
both is “meta-object”.)
(XXX static and dynamic are not great terms to use here, because “static” attributes may actually behave quite dynamically, and because they have nothing to do with static class members in C++ or Java. Barry suggests to use immutable and mutable instead, but those words already have precise and different meanings in slightly different contexts, so I think that would still be confusing.)
Examples of dynamic attributes are instance variables of class
instances, module attributes, etc. Examples of static attributes
are the methods of built-in objects like lists and dictionaries,
and the attributes of frame and code objects (f.f_code
,
c.co_filename
, etc.). When an object with dynamic attributes
exposes these through its __dict__
attribute, __dict__
is a static
attribute.
The names and values of dynamic properties are typically stored in
a dictionary, and this dictionary is typically accessible as
obj.__dict__
. The rest of this specification is more concerned
with discovering the names and properties of static attributes
than with dynamic attributes; the latter are easily discovered by
inspection of obj.__dict__
.
In the discussion below, I distinguish two kinds of objects:
regular objects (like lists, ints, functions) and meta-objects.
Types and classes are meta-objects. Meta-objects are also regular
objects, but we’re mostly interested in them because they are
referenced by the __class__
attribute of regular objects (or by
the __bases__
attribute of other meta-objects).
The class introspection API consists of the following elements:
- the
__class__
and__dict__
attributes on regular objects; - the
__bases__
and__dict__
attributes on meta-objects; - precedence rules;
- attribute descriptors.
Together, these not only tell us about all attributes defined by a meta-object, but they also help us calculate the value of a specific attribute of a given object.
- The
__dict__
attribute on regular objectsA regular object may have a
__dict__
attribute. If it does, this should be a mapping (not necessarily a dictionary) supporting at least__getitem__()
,keys()
, andhas_key()
. This gives the dynamic attributes of the object. The keys in the mapping give attribute names, and the corresponding values give their values.Typically, the value of an attribute with a given name is the same object as the value corresponding to that name as a key in the
__dict__
. In other words,obj.__dict__['spam']
isobj.spam
. (But see the precedence rules below; a static attribute with the same name may override the dictionary item.) - The
__class__
attribute on regular objectsA regular object usually has a
__class__
attribute. If it does, this references a meta-object. A meta-object can define static attributes for the regular object whose__class__
it is. This is normally done through the following mechanism: - The
__dict__
attribute on meta-objectsA meta-object may have a
__dict__
attribute, of the same form as the__dict__
attribute for regular objects (a mapping but not necessarily a dictionary). If it does, the keys of the meta-object’s__dict__
are names of static attributes for the corresponding regular object. The values are attribute descriptors; we’ll explain these later. An unbound method is a special case of an attribute descriptor.Because a meta-object is also a regular object, the items in a meta-object’s
__dict__
correspond to attributes of the meta-object; however, some transformation may be applied, and bases (see below) may define additional dynamic attributes. In other words,mobj.spam
is not alwaysmobj.__dict__['spam']
. (This rule contains a loophole because for classes, ifC.__dict__['spam']
is a function,C.spam
is an unbound method object.) - The
__bases__
attribute on meta-objectsA meta-object may have a
__bases__
attribute. If it does, this should be a sequence (not necessarily a tuple) of other meta-objects, the bases. An absent__bases__
is equivalent to an empty sequence of bases. There must never be a cycle in the relationship between meta-objects defined by__bases__
attributes; in other words, the__bases__
attributes define a directed acyclic graph, with arcs pointing from derived meta-objects to their base meta-objects. (It is not necessarily a tree, since multiple classes can have the same base class.) The__dict__
attributes of a meta-object in the inheritance graph supply attribute descriptors for the regular object whose__class__
attribute points to the root of the inheritance tree (which is not the same as the root of the inheritance hierarchy – rather more the opposite, at the bottom given how inheritance trees are typically drawn). Descriptors are first searched in the dictionary of the root meta-object, then in its bases, according to a precedence rule (see the next paragraph). - Precedence rules
When two meta-objects in the inheritance graph for a given regular object both define an attribute descriptor with the same name, the search order is up to the meta-object. This allows different meta-objects to define different search orders. In particular, classic classes use the old left-to-right depth-first rule, while new-style classes use a more advanced rule (see the section on method resolution order in PEP 253).
When a dynamic attribute (one defined in a regular object’s
__dict__
) has the same name as a static attribute (one defined by a meta-object in the inheritance graph rooted at the regular object’s__class__
), the static attribute has precedence if it is a descriptor that defines a__set__
method (see below); otherwise (if there is no__set__
method) the dynamic attribute has precedence. In other words, for data attributes (those with a__set__
method), the static definition overrides the dynamic definition, but for other attributes, dynamic overrides static.Rationale: we can’t have a simple rule like “static overrides dynamic” or “dynamic overrides static”, because some static attributes indeed override dynamic attributes; for example, a key ‘__class__’ in an instance’s
__dict__
is ignored in favor of the statically defined__class__
pointer, but on the other hand most keys ininst.__dict__
override attributes defined ininst.__class__
. Presence of a__set__
method on a descriptor indicates that this is a data descriptor. (Even read-only data descriptors have a__set__
method: it always raises an exception.) Absence of a__set__
method on a descriptor indicates that the descriptor isn’t interested in intercepting assignment, and then the classic rule applies: an instance variable with the same name as a method hides the method until it is deleted. - Attribute descriptors
This is where it gets interesting – and messy. Attribute descriptors (descriptors for short) are stored in the meta-object’s
__dict__
(or in the__dict__
of one of its ancestors), and have two uses: a descriptor can be used to get or set the corresponding attribute value on the (regular, non-meta) object, and it has an additional interface that describes the attribute for documentation and introspection purposes.There is little prior art in Python for designing the descriptor’s interface, neither for getting/setting the value nor for describing the attribute otherwise, except some trivial properties (it’s reasonable to assume that
__name__
and__doc__
should be the attribute’s name and docstring). I will propose such an API below.If an object found in the meta-object’s
__dict__
is not an attribute descriptor, backward compatibility dictates certain minimal semantics. This basically means that if it is a Python function or an unbound method, the attribute is a method; otherwise, it is the default value for a dynamic data attribute. Backwards compatibility also dictates that (in the absence of a__setattr__
method) it is legal to assign to an attribute corresponding to a method, and that this creates a data attribute shadowing the method for this particular instance. However, these semantics are only required for backwards compatibility with regular classes.
The introspection API is a read-only API. We don’t define the
effect of assignment to any of the special attributes (__dict__
,
__class__
and __bases__
), nor the effect of assignment to the
items of a __dict__
. Generally, such assignments should be
considered off-limits. A future PEP may define some semantics for
some such assignments. (Especially because currently instances
support assignment to __class__
and __dict__
, and classes support
assignment to __bases__
and __dict__
.)
Specification of the attribute descriptor API
Attribute descriptors may have the following attributes. In the
examples, x is an object, C is x.__class__
, x.meth()
is a method,
and x.ivar
is a data attribute or instance variable. All
attributes are optional – a specific attribute may or may not be
present on a given descriptor. An absent attribute means that the
corresponding information is not available or the corresponding
functionality is not implemented.
__name__
: the attribute name. Because of aliasing and renaming, the attribute may (additionally or exclusively) be known under a different name, but this is the name under which it was born. Example:C.meth.__name__ == 'meth'
.__doc__
: the attribute’s documentation string. This may be None.__objclass__
: the class that declared this attribute. The descriptor only applies to objects that are instances of this class (this includes instances of its subclasses). Example:C.meth.__objclass__ is C
.__get__()
: a function callable with one or two arguments that retrieves the attribute value from an object. This is also referred to as a “binding” operation, because it may return a “bound method” object in the case of method descriptors. The first argument, X, is the object from which the attribute must be retrieved or to which it must be bound. When X is None, the optional second argument, T, should be meta-object and the binding operation may return an unbound method restricted to instances of T. When both X and T are specified, X should be an instance of T. Exactly what is returned by the binding operation depends on the semantics of the descriptor; for example, static methods and class methods (see below) ignore the instance and bind to the type instead.__set__()
: a function of two arguments that sets the attribute value on the object. If the attribute is read-only, this method may raise a TypeError orAttributeError
exception (both are allowed, because both are historically found for undefined or unsettable attributes). Example:C.ivar.set(x, y) ~~ x.ivar = y
.
Static methods and class methods
The descriptor API makes it possible to add static methods and class methods. Static methods are easy to describe: they behave pretty much like static methods in C++ or Java. Here’s an example:
class C:
def foo(x, y):
print "staticmethod", x, y
foo = staticmethod(foo)
C.foo(1, 2)
c = C()
c.foo(1, 2)
Both the call C.foo(1, 2)
and the call c.foo(1, 2)
call foo()
with
two arguments, and print “staticmethod 1 2”. No “self” is declared in
the definition of foo()
, and no instance is required in the call.
The line “foo = staticmethod(foo)” in the class statement is the
crucial element: this makes foo()
a static method. The built-in
staticmethod()
wraps its function argument in a special kind of
descriptor whose __get__()
method returns the original function
unchanged. Without this, the __get__()
method of standard
function objects would have created a bound method object for
‘c.foo’ and an unbound method object for ‘C.foo’.
(XXX Barry suggests to use “sharedmethod” instead of “staticmethod”, because the word static is being overloaded in so many ways already. But I’m not sure if shared conveys the right meaning.)
Class methods use a similar pattern to declare methods that receive an implicit first argument that is the class for which they are invoked. This has no C++ or Java equivalent, and is not quite the same as what class methods are in Smalltalk, but may serve a similar purpose. According to Armin Rigo, they are similar to “virtual class methods” in Borland Pascal dialect Delphi. (Python also has real metaclasses, and perhaps methods defined in a metaclass have more right to the name “class method”; but I expect that most programmers won’t be using metaclasses.) Here’s an example:
class C:
def foo(cls, y):
print "classmethod", cls, y
foo = classmethod(foo)
C.foo(1)
c = C()
c.foo(1)
Both the call C.foo(1)
and the call c.foo(1)
end up calling foo()
with two arguments, and print “classmethod __main__.C 1”. The
first argument of foo()
is implied, and it is the class, even if
the method was invoked via an instance. Now let’s continue the
example:
class D(C):
pass
D.foo(1)
d = D()
d.foo(1)
This prints “classmethod __main__.D 1” both times; in other words,
the class passed as the first argument of foo()
is the class
involved in the call, not the class involved in the definition of
foo()
.
But notice this:
class E(C):
def foo(cls, y): # override C.foo
print "E.foo() called"
C.foo(y)
foo = classmethod(foo)
E.foo(1)
e = E()
e.foo(1)
In this example, the call to C.foo()
from E.foo()
will see class C
as its first argument, not class E. This is to be expected, since
the call specifies the class C. But it stresses the difference
between these class methods and methods defined in metaclasses,
where an upcall to a metamethod would pass the target class as an
explicit first argument. (If you don’t understand this, don’t
worry, you’re not alone.) Note that calling cls.foo(y)
would be a
mistake – it would cause infinite recursion. Also note that you
can’t specify an explicit ‘cls’ argument to a class method. If
you want this (e.g. the __new__
method in PEP 253 requires this),
use a static method with a class as its explicit first argument
instead.
C API
XXX The following is VERY rough text that I wrote with a different audience in mind; I’ll have to go through this to edit it more. XXX It also doesn’t go into enough detail for the C API.
A built-in type can declare special data attributes in two ways:
using a struct memberlist (defined in structmember.h) or a struct
getsetlist (defined in descrobject.h). The struct memberlist is
an old mechanism put to new use: each attribute has a descriptor
record including its name, an enum giving its type (various C
types are supported as well as PyObject *
), an offset from the
start of the instance, and a read-only flag.
The struct getsetlist mechanism is new, and intended for cases
that don’t fit in that mold, because they either require
additional checking, or are plain calculated attributes. Each
attribute here has a name, a getter C function pointer, a setter C
function pointer, and a context pointer. The function pointers
are optional, so that for example setting the setter function
pointer to NULL
makes a read-only attribute. The context pointer
is intended to pass auxiliary information to generic getter/setter
functions, but I haven’t found a need for this yet.
Note that there is also a similar mechanism to declare built-in
methods: these are PyMethodDef
structures, which contain a name
and a C function pointer (and some flags for the calling
convention).
Traditionally, built-in types have had to define their own
tp_getattro
and tp_setattro
slot functions to make these attribute
definitions work (PyMethodDef
and struct memberlist are quite
old). There are convenience functions that take an array of
PyMethodDef
or memberlist structures, an object, and an attribute
name, and return or set the attribute if found in the list, or
raise an exception if not found. But these convenience functions
had to be explicitly called by the tp_getattro
or tp_setattro
method of the specific type, and they did a linear search of the
array using strcmp()
to find the array element describing the
requested attribute.
I now have a brand spanking new generic mechanism that improves this situation substantially.
- Pointers to arrays of
PyMethodDef
, memberlist, getsetlist structures are part of the new type object (tp_methods
,tp_members
,tp_getset
). - At type initialization time (in
PyType_InitDict()
), for each entry in those three arrays, a descriptor object is created and placed in a dictionary that belongs to the type (tp_dict
). - Descriptors are very lean objects that mostly point to the corresponding structure. An implementation detail is that all descriptors share the same object type, and a discriminator field tells what kind of descriptor it is (method, member, or getset).
- As explained in PEP 252, descriptors have a
get()
method that takes an object argument and returns that object’s attribute; descriptors for writable attributes also have aset()
method that takes an object and a value and set that object’s attribute. Note that theget()
object also serves as abind()
operation for methods, binding the unbound method implementation to the object. - Instead of providing their own tp_getattro and tp_setattro
implementation, almost all built-in objects now place
PyObject_GenericGetAttr
and (if they have any writable attributes)PyObject_GenericSetAttr
in theirtp_getattro
andtp_setattro
slots. (Or, they can leave theseNULL
, and inherit them from the default base object, if they arrange for an explicit call toPyType_InitDict()
for the type before the first instance is created.) - In the simplest case,
PyObject_GenericGetAttr()
does exactly one dictionary lookup: it looks up the attribute name in the type’s dictionary (obj->ob_type->tp_dict). Upon success, there are two possibilities: the descriptor has a get method, or it doesn’t. For speed, the get and set methods are type slots:tp_descr_get
andtp_descr_set
. If thetp_descr_get
slot is non-NULL, it is called, passing the object as its only argument, and the return value from this call is the result of the getattr operation. If thetp_descr_get
slot isNULL
, as a fallback the descriptor itself is returned (compare class attributes that are not methods but simple values). PyObject_GenericSetAttr()
works very similar but uses thetp_descr_set
slot and calls it with the object and the new attribute value; if thetp_descr_set
slot isNULL
, anAttributeError
is raised.- But now for a more complicated case. The approach described
above is suitable for most built-in objects such as lists,
strings, numbers. However, some object types have a dictionary
in each instance that can store arbitrary attributes. In fact,
when you use a class statement to subtype an existing built-in
type, you automatically get such a dictionary (unless you
explicitly turn it off, using another advanced feature,
__slots__
). Let’s call this the instance dict, to distinguish it from the type dict. - In the more complicated case, there’s a conflict between names
stored in the instance dict and names stored in the type dict.
If both dicts have an entry with the same key, which one should
we return? Looking at classic Python for guidance, I find
conflicting rules: for class instances, the instance dict
overrides the class dict, except for the special attributes
(like
__dict__
and__class__
), which have priority over the instance dict. - I resolved this with the following set of rules, implemented in
PyObject_GenericGetAttr()
:- Look in the type dict. If you find a data descriptor, use
its
get()
method to produce the result. This takes care of special attributes like__dict__
and__class__
. - Look in the instance dict. If you find anything, that’s it. (This takes care of the requirement that normally the instance dict overrides the class dict.)
- Look in the type dict again (in reality this uses the saved
result from step 1, of course). If you find a descriptor,
use its
get()
method; if you find something else, that’s it; if it’s not there, raiseAttributeError
.
This requires a classification of descriptors as data and nondata descriptors. The current implementation quite sensibly classifies member and getset descriptors as data (even if they are read-only!) and method descriptors as nondata. Non-descriptors (like function pointers or plain values) are also classified as non-data (!).
- Look in the type dict. If you find a data descriptor, use
its
- This scheme has one drawback: in what I assume to be the most
common case, referencing an instance variable stored in the
instance dict, it does two dictionary lookups, whereas the
classic scheme did a quick test for attributes starting with two
underscores plus a single dictionary lookup. (Although the
implementation is sadly structured as
instance_getattr()
callinginstance_getattr1()
callinginstance_getattr2()
which finally callsPyDict_GetItem()
, and the underscore test callsPyString_AsString()
rather than inlining this. I wonder if optimizing the snot out of this might not be a good idea to speed up Python 2.2, if we weren’t going to rip it all out. :-) - A benchmark verifies that in fact this is as fast as classic instance variable lookup, so I’m no longer worried.
- Modification for dynamic types: step 1 and 3 look in the dictionary of the type and all its base classes (in MRO sequence, or course).
Discussion
XXX
Examples
Let’s look at lists. In classic Python, the method names of lists were available as the __methods__ attribute of list objects:
>>> [].__methods__
['append', 'count', 'extend', 'index', 'insert', 'pop',
'remove', 'reverse', 'sort']
>>>
Under the new proposal, the __methods__ attribute no longer exists:
>>> [].__methods__
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'list' object has no attribute '__methods__'
>>>
Instead, you can get the same information from the list type:
>>> T = [].__class__
>>> T
<type 'list'>
>>> dir(T) # like T.__dict__.keys(), but sorted
['__add__', '__class__', '__contains__', '__eq__', '__ge__',
'__getattr__', '__getitem__', '__getslice__', '__gt__',
'__iadd__', '__imul__', '__init__', '__le__', '__len__',
'__lt__', '__mul__', '__ne__', '__new__', '__radd__',
'__repr__', '__rmul__', '__setitem__', '__setslice__', 'append',
'count', 'extend', 'index', 'insert', 'pop', 'remove',
'reverse', 'sort']
>>>
The new introspection API gives more information than the old one:
in addition to the regular methods, it also shows the methods that
are normally invoked through special notations, e.g. __iadd__
(+=
), __len__
(len
), __ne__
(!=
).
You can invoke any method from this list directly:
>>> a = ['tic', 'tac']
>>> T.__len__(a) # same as len(a)
2
>>> T.append(a, 'toe') # same as a.append('toe')
>>> a
['tic', 'tac', 'toe']
>>>
This is just like it is for user-defined classes.
Notice a familiar yet surprising name in the list: __init__
. This
is the domain of PEP 253.
Backwards compatibility
XXX
Warnings and Errors
XXX
Implementation
A partial implementation of this PEP is available from CVS as a branch named “descr-branch”. To experiment with this implementation, proceed to check out Python from CVS according to the instructions at http://sourceforge.net/cvs/?group_id=5470 but add the arguments “-r descr-branch” to the cvs checkout command. (You can also start with an existing checkout and do “cvs update -r descr-branch”.) For some examples of the features described here, see the file Lib/test/test_descr.py.
Note: the code in this branch goes way beyond this PEP; it is also the experimentation area for PEP 253 (Subtyping Built-in Types).
References
XXX
Copyright
This document has been placed in the public domain.
Source: https://github.com/python-discord/peps/blob/main/pep-0252.txt
Last modified: 2022-01-21 11:03:51 GMT