PEP 681 – Data Class Transforms
- PEP
- 681
- Title
- Data Class Transforms
- Author
- Erik De Bonte <erikd at microsoft.com>, Eric Traut <erictr at microsoft.com>
- Sponsor
- Jelle Zijlstra <jelle.zijlstra at gmail.com>
- Discussions-To
- Typing-SIG
- Status
- Draft
- Type
- Standards Track
- Created
- 02-Dec-2021
- Python-Version
- 3.11
- Post-History
Contents
Abstract
PEP 557 introduced the dataclass to the Python stdlib. Several popular libraries have behaviors that are similar to dataclasses, but these behaviors cannot be described using standard type annotations. Such projects include attrs, pydantic, and object relational mapper (ORM) packages such as SQLAlchemy and Django.
Most type checkers, linters and language servers have full support for dataclasses. This proposal aims to generalize this functionality and provide a way for third-party libraries to indicate that certain decorator functions, classes, and metaclasses provide behaviors similar to dataclasses.
These behaviors include:
- Synthesizing an
__init__
method based on declared data fields. - Optionally synthesizing
__eq__
,__ne__
,__lt__
,__le__
,__gt__
and__ge__
methods. - Supporting “frozen” classes, a way to enforce immutability during static type checking.
- Supporting “field descriptors”, which describe attributes of individual fields that a static type checker must be aware of, such as whether a default value is provided for the field.
The full behavior of the stdlib dataclass is described in the Python documentation.
This proposal does not affect CPython directly except for the addition
of a dataclass_transform
decorator in typing.py
.
Motivation
There is no existing, standard way for libraries with dataclass-like semantics to declare their behavior to type checkers. To work around this limitation, Mypy custom plugins have been developed for many of these libraries, but these plugins don’t work with other type checkers, linters or language servers. They are also costly to maintain for library authors, and they require that Python developers know about the existence of these plugins and download and configure them within their environment.
Rationale
The intent of this proposal is not to support every feature of every library with dataclass-like semantics, but rather to make it possible to use the most common features of these libraries in a way that is compatible with static type checking. If a user values these libraries and also values static type checking, they may need to avoid using certain features or make small adjustments to the way they use them. That’s already true for the Mypy custom plugins, which don’t support every feature of every dataclass-like library.
As new features are added to dataclass in the future, we intend, when
appropriate, to add support for those features on
dataclass_transform
as well. Keeping these two feature sets in
sync will make it easier for dataclass users to understand and use
dataclass_transform
and will simplify the maintenance of dataclass
support in type checkers.
Specification
The dataclass_transform
decorator
This specification introduces a new decorator function in
the typing
module named dataclass_transform
. This decorator
can be applied to either a function that is itself a decorator,
a class, or a metaclass. The presence of
dataclass_transform
tells a static type checker that the decorated
function, class, or metaclass performs runtime “magic” that transforms
a class, endowing it with dataclass-like behaviors.
If dataclass_transform
is applied to a function, using the decorated
function as a decorator is assumed to apply dataclass-like semantics.
If dataclass_transform
is applied to a class, dataclass-like
semantics will be assumed for any class that derives from the
decorated class or uses the decorated class as a metaclass.
Examples of each approach are shown in the following sections. Each
example creates a CustomerModel
class with dataclass-like semantics.
The implementation of the decorated objects is omitted for brevity,
but we assume that they modify classes in the following ways:
- They synthesize an
__init__
method using data fields declared within the class and its parent classes. - They synthesize
__eq__
and__ne__
methods.
Type checkers supporting this PEP will recognize that the
CustomerModel
class can be instantiated using the synthesized
__init__
method:
# Using positional arguments
c1 = CustomerModel(327, "John Smith")
# Using keyword arguments
c2 = CustomerModel(id=327, name="John Smith")
# These calls will generate runtime errors and should be flagged as
# errors by a static type checker.
c3 = CustomerModel()
c4 = CustomerModel(327, first_name="John")
c5 = CustomerModel(327, "John Smith", 0)
Decorator function example
_T = TypeVar("_T")
# The ``create_model`` decorator is defined by a library.
# This could be in a type stub or inline.
@typing.dataclass_transform()
def create_model(cls: Type[_T]) -> Type[_T]:
cls.__init__ = ...
cls.__eq__ = ...
cls.__ne__ = ...
return cls
# The ``create_model`` decorator can now be used to create new model
# classes, like this:
@create_model
class CustomerModel:
id: int
name: str
Class example
# The ``ModelBase`` class is defined by a library. This could be in
# a type stub or inline.
@typing.dataclass_transform()
class ModelBase: ...
# The ``ModelBase`` class can now be used to create new model
# subclasses, like this:
class CustomerModel(ModelBase):
id: int
name: str
Metaclass example
# The ``ModelMeta`` metaclass and ``ModelBase`` class are defined by
# a library. This could be in a type stub or inline.
@typing.dataclass_transform()
class ModelMeta(type): ...
class ModelBase(metaclass=ModelMeta): ...
# The ``ModelBase`` class can now be used to create new model
# subclasses, like this:
class CustomerModel(ModelBase):
id: int
name: str
Decorator function and class/metaclass parameters
A decorator function, class, or metaclass that provides dataclass-like
functionality may accept parameters that modify certain behaviors.
This specification defines the following parameters that static type
checkers must honor if they are used by a dataclass transform. Each of
these parameters accepts a bool argument, and it must be possible for
the bool value (True
or False
) to be statically evaluated.
eq
.order
,frozen
,init
andunsafe_hash
are parameters supported in the stdlib dataclass, with meanings defined in PEP 557.kw_only
,match_args
andslots
are parameters supported in the stdlib dataclass, first introduced in Python 3.10.
dataclass_transform
parameters
Parameters to dataclass_transform
allow for some basic
customization of default behaviors:
_T = TypeVar("_T")
def dataclass_transform(
*,
eq_default: bool = True,
order_default: bool = False,
kw_only_default: bool = False,
transform_descriptor_types: bool = False,
field_descriptors: tuple[type | Callable[..., Any], ...] = (),
) -> Callable[[_T], _T]: ...
eq_default
indicates whether theeq
parameter is assumed to be True or False if it is omitted by the caller. If not specified,eq_default
will default to True (the default assumption for dataclass).order_default
indicates whether theorder
parameter is assumed to be True or False if it is omitted by the caller. If not specified,order_default
will default to False (the default assumption for dataclass).kw_only_default
indicates whether thekw_only
parameter is assumed to be True or False if it is omitted by the caller. If not specified,kw_only_default
will default to False (the default assumption for dataclass).transform_descriptor_types
affects fields annotated with descriptor types that define a__set__
method. If True, the type of each parameter on the synthesized__init__
method corresponding to such a field will be the type of the value parameter to the descriptor’s__set__
method. If False, the descriptor type will be used. If not specified,transform_descriptor_types
will default to False (the default behavior of dataclass).field_descriptors
specifies a static list of supported classes that describe fields. Some libraries also supply functions to allocate instances of field descriptors, and those functions may also be specified in this tuple. If not specified,field_descriptors
will default to an empty tuple (no field descriptors supported). The standard dataclass behavior supports only one type of field descriptor calledField
plus a helper function (field
) that instantiates this class, so if we were describing the stdlib dataclass behavior, we would provide the tuple argument(dataclasses.Field, dataclasses.field)
.
The following sections provide additional examples showing how these parameters are used.
Decorator function example
# Indicate that the ``create_model`` function assumes keyword-only
# parameters for the synthesized ``__init__`` method unless it is
# invoked with ``kw_only=False``. It always synthesizes order-related
# methods and provides no way to override this behavior.
@typing.dataclass_transform(kw_only_default=True, order_default=True)
def create_model(
*,
frozen: bool = False,
kw_only: bool = True,
) -> Callable[[Type[_T]], Type[_T]]: ...
# Example of how this decorator would be used by code that imports
# from this library:
@create_model(frozen=True, kw_only=False)
class CustomerModel:
id: int
name: str
Class example
# Indicate that classes that derive from this class default to
# synthesizing comparison methods.
@typing.dataclass_transform(eq_default=True, order_default=True)
class ModelBase:
def __init_subclass__(
cls,
*,
init: bool = True,
frozen: bool = False,
eq: bool = True,
order: bool = True,
):
...
# Example of how this class would be used by code that imports
# from this library:
class CustomerModel(
ModelBase,
init=False,
frozen=True,
eq=False,
order=False,
):
id: int
name: str
Metaclass example
# Indicate that classes that use this metaclass default to
# synthesizing comparison methods.
@typing.dataclass_transform(eq_default=True, order_default=True)
class ModelMeta(type):
def __new__(
cls,
name,
bases,
namespace,
*,
init: bool = True,
frozen: bool = False,
eq: bool = True,
order: bool = True,
):
...
class ModelBase(metaclass=ModelMeta):
...
# Example of how this class would be used by code that imports
# from this library:
class CustomerModel(
ModelBase,
init=False,
frozen=True,
eq=False,
order=False,
):
id: int
name: str
transform_descriptor_types
example
Because transform_descriptor_types
is set to True
, the
target
parameter on the synthesized __init__
method will be of
type float
(the type of __set__
‘s value
parameter)
instead of Descriptor
.
@typing.dataclass_transform(transform_descriptor_types=True)
def create_model() -> Callable[[Type[_T]], Type[_T]]: ...
# We anticipate that most descriptor classes used with
# transform_descriptor_types will be generic with __set__ functions
# whose value parameters are based on the generic's type vars.
# However, this is not required.
class Descriptor:
def __get__(self, instance: object, owner: Any) -> int:
...
# The setter and getter can have different types (asymmetric).
# The setter's value type is used for the __init__ parameter.
# The getter's return type is ignored.
def __set__(self, instance: object, value: float):
...
@create_model
class CustomerModel:
target: Descriptor
Field descriptors
Most libraries that support dataclass-like semantics provide one or
more “field descriptor” types that allow a class definition to provide
additional metadata about each field in the class. This metadata can
describe, for example, default values, or indicate whether the field
should be included in the synthesized __init__
method.
Field descriptors can be omitted in cases where additional metadata is not required:
@dataclass
class Employee:
# Field with no descriptor
name: str
# Field that uses field descriptor class instance
age: Optional[int] = field(default=None, init=False)
# Field with type annotation and simple initializer to
# describe default value
is_paid_hourly: bool = True
# Not a field (but rather a class variable) because type
# annotation is not provided.
office_number = "unassigned"
Field descriptor parameters
Libraries that support dataclass-like semantics and support field descriptor classes typically use common parameter names to construct these field descriptors. This specification formalizes the names and meanings of the parameters that must be understood for static type checkers. These standardized parameters must be keyword-only.
These parameters are a superset of those supported by
dataclasses.field
, excluding those that do not have an impact on
type checking such as compare
and hash
.
Field descriptor classes are allowed to use other parameters in their constructors, and those parameters can be positional and may use other names.
init
is an optional bool parameter that indicates whether the field should be included in the synthesized__init__
method. If unspecified,init
defaults to True. Field descriptor functions can use overloads that implicitly specify the value ofinit
using a literal bool value type (Literal[False]
orLiteral[True]
).default
is an optional parameter that provides the default value for the field.default_factory
is an optional parameter that provides a runtime callback that returns the default value for the field. If neitherdefault
nordefault_factory
are specified, the field is assumed to have no default value and must be provided a value when the class is instantiated.factory
is an alias fordefault_factory
. Stdlib dataclasses use the namedefault_factory
, but attrs uses the namefactory
in many scenarios, so this alias is necessary for supporting attrs.kw_only
is an optional bool parameter that indicates whether the field should be marked as keyword-only. If true, the field will be keyword-only. If false, it will not be keyword-only. If unspecified, the value of thekw_only
parameter on the object decorated withdataclass_transform
will be used, or if that is unspecified, the value ofkw_only_default
ondataclass_transform
will be used.alias
is an optional str parameter that provides an alternative name for the field. This alternative name is used in the synthesized__init__
method.
It is an error to specify more than one of default
,
default_factory
and factory
.
This example demonstrates the above:
# Library code (within type stub or inline)
# In this library, passing a resolver means that init must be False,
# and the overload with Literal[False] enforces that.
@overload
def model_field(
*,
default: Optional[Any] = ...,
resolver: Callable[[], Any],
init: Literal[False] = False,
) -> Any: ...
@overload
def model_field(
*,
default: Optional[Any] = ...,
resolver: None = None,
init: bool = True,
) -> Any: ...
@typing.dataclass_transform(
kw_only_default=True,
field_descriptors=(model_field, ))
def create_model(
*,
init: bool = True,
) -> Callable[[Type[_T]], Type[_T]]: ...
# Code that imports this library:
@create_model(init=False)
class CustomerModel:
id: int = model_field(resolver=lambda : 0)
name: str
Runtime behavior
At runtime, the dataclass_transform
decorator’s only effect is to
set a string attribute named __dataclass_transform__
on the
decorated function or class to support introspection. The value of the
attribute should be a dict mapping the names of the
dataclass_transform
parameters to their values.
For example:
{
"eq_default": True,
"order_default": False,
"kw_only_default": False,
"transform_descriptor_types": False,
"field_descriptors": (),
}
Dataclass semantics
The following dataclass semantics are implied when a function or class
decorated with dataclass_transform
is in use.
- Frozen dataclasses cannot inherit from non-frozen dataclasses. A
class that has been decorated with
dataclass_transform
is considered neither frozen nor non-frozen, thus allowing frozen classes to inherit from it. Similarly, a class that directly specifies a metaclass that is decorated withdataclass_transform
is considered neither frozen nor non-frozen.Consider these class examples:
# ModelBase is not considered either "frozen" or "non-frozen" # because it is decorated with ``dataclass_transform`` @typing.dataclass_transform() class ModelBase(): ... # Vehicle is considered non-frozen because it does not specify # "frozen=True". class Vehicle(ModelBase): name: str # Car is a frozen class that derives from Vehicle, which is a # non-frozen class. This is an error. class Car(Vehicle, frozen=True): wheel_count: int
And these similar metaclass examples:
@typing.dataclass_transform() class ModelMeta(type): ... # ModelBase is not considered either "frozen" or "non-frozen" # because it directly specifies ModelMeta as its metaclass. class ModelBase(metaclass=ModelMeta): ... # Vehicle is considered non-frozen because it does not specify # "frozen=True". class Vehicle(ModelBase): name: str # Car is a frozen class that derives from Vehicle, which is a # non-frozen class. This is an error. class Car(Vehicle, frozen=True): wheel_count: int
- Field ordering and inheritance is assumed to follow the rules specified in 557. This includes the effects of overrides (redefining a field in a child class that has already been defined in a parent class).
- PEP 557 indicates that
all fields without default values must appear before
fields with default values. Although not explicitly
stated in PEP 557, this rule is ignored when
init=False
, and this specification likewise ignores this requirement in that situation. Likewise, there is no need to enforce this ordering when keyword-only parameters are used for__init__
, so the rule is not enforced ifkw_only
semantics are in effect. - As with dataclass, method synthesis is skipped if it would
overwrite a method that is explicitly declared within the class.
For example, if a class declares an
__init__
method explicitly, an__init__
method will not be synthesized for that class. - KW_ONLY sentinel values are supported as described in the Python docs and bpo-43532.
- ClassVar attributes are not considered dataclass fields and are ignored by dataclass mechanisms.
Undefined behavior
If multiple dataclass_transform
decorators are found, either on a
single function/class or within a class hierarchy, the resulting
behavior is undefined. Library authors should avoid these scenarios.
The __set__
method on descriptors is not expected to be
overloaded. If such overloads are found when
transform_descriptor_types
is True
, the resulting behavior is
undefined.
Reference Implementation
Pyright contains the reference implementation of type
checker support for dataclass_transform
. Pyright’s
dataClasses.ts
source file would be a good
starting point for understanding the implementation.
The attrs and pydantic
libraries are using dataclass_transform
and serve as real-world
examples of its usage.
Rejected Ideas
auto_attribs
parameter
The attrs library supports an auto_attribs
parameter that
indicates whether class members decorated with PEP 526 variable
annotations but with no assignment should be treated as data fields.
We considered supporting auto_attribs
and a corresponding
auto_attribs_default
parameter, but decided against this because it
is specific to attrs and appears to be a legacy behavior. Instead of
supporting this in the new standard, we recommend that the maintainers
of attrs move away from the legacy semantics and adopt
auto_attribs
behaviors by default.
Django does not support declaring fields using type annotations only,
so Django users who leverage dataclass_transform
should be aware
that they should always supply assigned values.
cmp
parameter
The attrs library supports a bool parameter cmp
that is equivalent
to setting both eq
and order
to True. We chose not to support
a cmp
parameter, since it only applies to attrs. Attrs users
should use the dataclass-standard eq
and order
parameter names
instead.
Automatic field name aliasing
The attrs library performs automatic aliasing of
field names that start with a single underscore, stripping the
underscore from the name of the corresponding __init__
parameter.
This proposal omits that behavior since it is specific to attrs. Users
can manually alias these fields using the alias
parameter.
Alternate field ordering algorithms
The attrs library currently supports two approaches to ordering the fields within a class:
- Dataclass order: The same ordering used by dataclasses. This is the
default behavior of the older APIs (e.g.
attr.s
). - Method Resolution Order (MRO): This is the default behavior of the
newer APIs (e.g. define, mutable, frozen). Older APIs (e.g.
attr.s
) can opt into this behavior by specifyingcollect_by_mro=True
.
The resulting field orderings can differ in certain diamond-shaped multiple inheritance scenarios.
For simplicity, this proposal does not support any field ordering other than that used by dataclasses.
Fields redeclared in subclasses
The attrs library differs from stdlib dataclasses in how it handles inherited fields that are redeclared in subclasses. The dataclass specification preserves the original order, but attrs defines a new order based on subclasses.
For simplicity, we chose to only support the dataclass behavior.
Users of attrs who rely on the attrs-specific ordering will not see
the expected order of parameters in the synthesized __init__
method.
Django primary and foreign keys
Django applies additional logic for primary and foreign keys. For example, it automatically adds an id
field
(and __init__
parameter) if there is no field designated as a
primary key.
As this is not broadly applicable to dataclass libraries, this
additional logic is not accommodated with this proposal, so
users of Django would need to explicitly declare the id
field.
This limitation may make it impractical to use the
dataclass_transform
mechanism with Django.
Class-wide default values
SQLAlchemy requested that we expose a way to specify that the default value of all fields in the transformed class is None. It is typical that all of their fields are optional, and None indicates that the field is not set.
We chose not to support this feature, since it is specific to
SQLAlchemy. Users can manually set default=None
on these fields
instead.
Open Issues
converter
field descriptor parameter
The attrs library supports a converter
field descriptor parameter,
which is a callable that is called by the generated
__init__
method to convert the supplied value to some other
desired value. This is tricky to support since the parameter type in
the synthesized __init__ method needs to accept uncovered values, but
the resulting field is typed according to the output of the converter.
There may be no good way to support this because there’s not enough information to derive the type of the input parameter. We currently have two ideas:
- Add support for a
converter
field descriptor parameter but then use the Any type for the corresponding parameter in the __init__ method. - Say that converters are unsupported and recommend that attrs users avoid them.
Some aspects of this issue are detailed in a Pyright discussion.
Copyright
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
Source: https://github.com/python-discord/peps/blob/main/pep-0681.rst
Last modified: 2022-03-23 02:47:39 GMT