PEP 642 – Explicit Pattern Syntax for Structural Pattern Matching
- PEP
- 642
- Title
- Explicit Pattern Syntax for Structural Pattern Matching
- Author
- Nick Coghlan <ncoghlan at gmail.com>
- BDFL-Delegate
- Discussions-To
- python-dev@python.org
- Status
- Draft
- Type
- Standards Track
- Requires
- 634
- Created
- 26-Sep-2020
- Python-Version
- 3.10
- Post-History
- 31-Oct-2020, 08-Nov-2020, 03-Jan-2021
- Resolution
Contents
- Abstract
- Relationship with other PEPs
- Motivation
- Specification
- Design Discussion
- Requiring explicit qualification of simple names in match patterns
- Resisting the temptation to guess
- Interaction with caching of attribute lookups in local variables
- Using existing comparison operators as the value constraint prefix
- Using
__
as the wildcard pattern marker - Representing patterns explicitly in the Abstract Syntax Tree
- Changes to sequence patterns
- Changes to mapping patterns
- Changes to class patterns
- Deferred Ideas
- Inferred value constraints
- Making some required parentheses optional
- Accepting complex literals as closed expressions
- Allowing negated constraints in match patterns
- Allowing membership checks in match patterns
- Inferring a default type for instance attribute constraints
- Avoiding special cases in sequence patterns
- Expression syntax to retrieve multiple attributes from an instance
- Expression syntax to retrieve multiple attributes from an instance
- Rejected Ideas
- Reference Implementation
- Acknowledgments
- References
- Appendix A – Full Grammar
- Appendix B: Summary of Abstract Syntax Tree changes
- Appendix C: Summary of changes relative to PEP 634
- Appendix D: History of changes to this proposal
- Copyright
Abstract
This PEP covers an alternative syntax proposal for PEP 634’s structural pattern matching that requires explicit prefixes on all capture patterns and value constraints. It also proposes a new dedicated syntax for instance attribute patterns that aligns more closely with the proposed mapping pattern syntax.
While the result is necessarily more verbose than the proposed syntax in PEP 634, it is still significantly less verbose than the status quo.
As an example, the following match statement would extract “host” and “port” details from a 2 item sequence, a mapping with “host” and “port” keys, any object with “host” and “port” attributes, or a “host:port” string, treating the “port” as optional in the latter three cases:
port = DEFAULT_PORT
match expr:
case [as host, as port]:
pass
case {"host" as host, "port" as port}:
pass
case {"host" as host}:
pass
case object{.host as host, .port as port}:
pass
case object{.host as host}:
pass
case str{} as addr:
host, __, optional_port = addr.partition(":")
if optional_port:
port = optional_port
case __ as m:
raise TypeError(f"Unknown address format: {m!r:.200}")
port = int(port)
At a high level, this PEP proposes to categorise the different available pattern types as follows:
- wildcard pattern:
__
- group patterns:
(PTRN)
- value constraint patterns:
- equality constraints:
== EXPR
- identity constraints:
is EXPR
- equality constraints:
- structural constraint patterns:
- sequence constraint patterns:
[PTRN, as NAME, PTRN as NAME]
- mapping constraint patterns:
{EXPR: PTRN, EXPR as NAME}
- instance attribute constraint patterns:
CLS{.NAME, .NAME: PTRN, .NAME == EXPR, .NAME as NAME}
- class defined constraint patterns:
CLS(PTRN, PTRN, **{.NAME, .NAME: PTRN, .NAME == EXPR, .NAME as NAME})
- sequence constraint patterns:
- OR patterns:
PTRN | PTRN | PTRN
- AS patterns:
PTRN as NAME
(omitting the pattern implies__
)
The intent of this approach is to:
- allow an initial form of pattern matching to be developed and released without needing to decide up front on the best default options for handling bare names, attribute lookups, and literal values
- ensure that pattern matching is defined explicitly at the Abstract Syntax Tree level, allowing the specifications of the semantics and the surface syntax for pattern matching to be clearly separated
- define a clear and concise “ducktyping” syntax that could potentially be adopted in ordinary expressions as a way to more easily retrieve a tuple containing multiple attributes from the same object
Relative to PEP 634, the proposal also deliberately eliminates any syntax that
“binds to the right” without using the as
keyword (using capture patterns
in PEP 634’s mapping patterns and class patterns) or binds to both the left and
the right in the same pattern (using PEP 634’s capture patterns with AS patterns)
Relationship with other PEPs
This PEP both depends on and competes with PEP 634 - the PEP author agrees that match statements would be a sufficiently valuable addition to the language to be worth the additional complexity that they add to the learning process, but disagrees with the idea that “simple name vs literal or attribute lookup” really offers an adequate syntactic distinction between name binding and value lookup operations in match patterns (at least for Python).
This PEP agrees with the spirit of PEP 640 (that the chosen wildcard pattern to
skip a name binding should be supported everywhere, not just in match patterns),
but is now proposing a different spelling for the wildcard syntax (__
rather
than ?
). As such, it competes with PEP 640 as written, but would complement
a proposal to deprecate the use of __
as an ordinary identifier and instead
turn it into a general purpose wildcard marker that always skips making a new
local variable binding.
While it has not yet been put forward as a PEP, Mark Shannon has a pre-PEP draft [8] expressing several concerns about the runtime semantics of the pattern matching proposal in PEP 634. This PEP is somewhat complementary to that one, as even though this PEP is mostly about surface syntax changes rather than major semantic changes, it does propose that the Abstract Syntax Tree definition be made more explicit to better separate the details of the surface syntax from the semantics of the code generation step. There is one specific idea in that pre-PEP draft that this PEP explicitly rejects: the idea that the different kinds of matching are mutually exclusive. It’s entirely possible for the same value to match different kinds of structural pattern, and which one takes precedence will intentionally be governed by the order of the cases in the match statement.
Motivation
The original PEP 622 (which was later split into PEP 634, PEP 635, and PEP 636) incorporated an unstated but essential assumption in its syntax design: that neither ordinary expressions nor the existing assignment target syntax provide an adequate foundation for the syntax used in match patterns.
While the PEP didn’t explicitly state this assumption, one of the PEP authors explained it clearly on python-dev [1]:
The actual problem that I see is that we have different cultures/intuitions fundamentally clashing here. In particular, so many programmers welcome pattern matching as an “extended switch statement” and find it therefore strange that names are binding and not expressions for comparison. Others argue that it is at odds with current assignment statements, say, and question why dotted names are _/not/_ binding. What all groups seem to have in common, though, is that they refer to _/their/_ understanding and interpretation of the new match statement as ‘consistent’ or ‘intuitive’ — naturally pointing out where we as PEP authors went wrong with our design.But here is the catch: at least in the Python world, pattern matching as proposed by this PEP is an unprecedented and new way of approaching a common problem. It is not simply an extension of something already there. Even worse: while designing the PEP we found that no matter from which angle you approach it, you will run into issues of seeming ‘inconsistencies’ (which is to say that pattern matching cannot be reduced to a ‘linear’ extension of existing features in a meaningful way): there is always something that goes fundamentally beyond what is already there in Python. That’s why I argue that arguments based on what is ‘intuitive’ or ‘consistent’ just do not make sense _/in this case/_.
The first iteration of this PEP was then born out of an attempt to show that the second assertion was not accurate, and that match patterns could be treated as a variation on assignment targets without leading to inherent contradictions. (An earlier PR submitted to list this option in the “Rejected Ideas” section of the original PEP 622 had previously been declined [2]).
However, the review process for this PEP strongly suggested that not only did the contradictions that Tobias mentioned in his email exist, but they were also concerning enough to cast doubts on the syntax proposal presented in PEP 634. Accordingly, this PEP was changed to go even further than PEP 634, and largely abandon alignment between the sequence matching syntax and the existing iterable unpacking syntax (effectively answering “Not really, as least as far as the exact syntax is concerned” to the first question raised in the DLS’20 paper [9]: “Can we extend a feature like iterable unpacking to work for more general object and data layouts?”).
This resulted in a complete reversal of the goals of the PEP: rather than attempting to emphasise the similarities between assignment and pattern matching, the PEP now attempts to make sure that assignment target syntax isn’t being reused at all, reducing the likelihood of incorrect inferences being drawn about the new construct based on experience with existing ones.
Finally, before completing the 3rd iteration of the proposal (which dropped inferred patterns entirely), the PEP author spent quite a bit of time reflecting on the following entries in PEP 20:
- Explicit is better than implicit.
- Special cases aren’t special enough to break the rules.
- In the face of ambiguity, refuse the temptation to guess.
If we start with an explicit syntax, we can always add syntactic shortcuts later
(e.g. consider the recent proposals to add shortcuts for Union
and
Optional
type hints only after years of experience with the original more
verbose forms), while if we start out with only the abbreviated forms,
then we don’t have any real way to revisit those decisions in a future release.
Specification
This PEP retains the overall match
/case
statement structure and semantics
from PEP 634, but proposes multiple changes that mean that user intent is
explicitly specified in the concrete syntax rather than needing to be inferred
from the pattern matching context.
In the proposed Abstract Syntax Tree, the semantics are also always explicit, with no inference required.
The Match Statement
Surface syntax:
match_stmt: "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
subject_expr:
| star_named_expression ',' star_named_expressions?
| named_expression
case_block: "case" (guarded_pattern | open_pattern) ':' block
guarded_pattern: closed_pattern 'if' named_expression
open_pattern:
| as_pattern
| or_pattern
closed_pattern:
| wildcard_pattern
| group_pattern
| structural_constraint
Abstract syntax:
Match(expr subject, match_case* cases)
match_case = (pattern pattern, expr? guard, stmt* body)
The rules star_named_expression
, star_named_expressions
,
named_expression
and block
are part of the standard Python
grammar.
Open patterns are patterns which consist of multiple tokens, and aren’t
necessarily terminated by a closing delimiter (for example, __ as x
,
int() | bool()
). To avoid ambiguity for human readers, their usage is
restricted to top level patterns and to group patterns (which are patterns
surrounded by parentheses).
Closed patterns are patterns which either consist of a single token
(i.e. __
), or else have a closing delimiter as a required part of their
syntax (e.g. [as x, as y]
, object{.x as x, .y as y}
).
As in PEP 634, the match
and case
keywords are soft keywords, i.e. they
are not reserved words in other grammatical contexts (including at the
start of a line if there is no colon where expected). This means
that they are recognized as keywords when part of a match
statement or case block only, and are allowed to be used in all
other contexts as variable or argument names.
Unlike PEP 634, patterns are explicitly defined as a new kind of node in the abstract syntax tree - even when surface syntax is shared with existing expression nodes, a distinct abstract node is emitted by the parser.
For context, match_stmt
is a new alternative for
compound_statement
in the surface syntax and Match
is a new
alternative for stmt
in the abstract syntax.
Match Semantics
This PEP largely retains the overall pattern matching semantics proposed in PEP 634.
The proposed syntax for patterns changes significantly, and is discussed in detail below.
There are also some proposed changes to the semantics of class defined
constraints (class patterns in PEP 634) to eliminate the need to special case
any builtin types (instead, the introduction of dedicated syntax for instance
attribute constraints allows the behaviour needed by those builtin types to be
specified as applying to any type that sets __match_args__
to None
)
Guards
This PEP retains the guard clause semantics proposed in PEP 634.
However, the syntax is changed slightly to require that when a guard clause is present, the case pattern must be a closed pattern.
This makes it clearer to the reader where the pattern ends and the guard clause begins. (This is mainly a potential problem with OR patterns, where the guard clause looks kind of like the start of a conditional expression in the final pattern. Actually doing that isn’t legal syntax, so there’s no ambiguity as far as the compiler is concerned, but the distinction may not be as clear to a human reader)
Irrefutable case blocks
The definition of irrefutable case blocks changes slightly in this PEP relative to PEP 634, as capture patterns no longer exist as a separate concept from AS patterns.
Aside from that caveat, the handling of irrefutable cases is the same as in PEP 634:
- wildcard patterns are irrefutable
- AS patterns whose left-hand side is irrefutable
- OR patterns containing at least one irrefutable pattern
- parenthesized irrefutable patterns
- a case block is considered irrefutable if it has no guard and its pattern is irrefutable.
- a match statement may have at most one irrefutable case block, and it must be last.
Patterns
The top-level surface syntax for patterns is as follows:
open_pattern: # Pattern may use multiple tokens with no closing delimiter
| as_pattern
| or_pattern
as_pattern: [closed_pattern] pattern_as_clause
or_pattern: '|'.simple_pattern+
simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
| closed_pattern
| value_constraint
closed_pattern: # Require a single token or a closing delimiter in pattern
| wildcard_pattern
| group_pattern
| structural_constraint
As described above, the usage of open patterns is limited to top level case clauses and when parenthesised in a group pattern.
The abstract syntax for patterns explicitly indicates which elements are subpatterns and which elements are subexpressions or identifiers:
pattern = MatchAlways
| MatchValue(matchop op, expr value)
| MatchSequence(pattern* patterns)
| MatchMapping(expr* keys, pattern* patterns)
| MatchAttrs(expr cls, identifier* attrs, pattern* patterns)
| MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns)
| MatchRestOfSequence(identifier? target)
-- A NULL entry in the MatchMapping key list handles capturing extra mapping keys
| MatchAs(pattern? pattern, identifier target)
| MatchOr(pattern* patterns)
AS Patterns
Surface syntax:
as_pattern: [closed_pattern] pattern_as_clause
pattern_as_clause: 'as' pattern_capture_target
pattern_capture_target: !"__" NAME !('.' | '(' | '=')
(Note: the name on the right may not be __
.)
Abstract syntax:
MatchAs(pattern? pattern, identifier target)
An AS pattern matches the closed pattern on the left of the as
keyword against the subject. If this fails, the AS pattern fails.
Otherwise, the AS pattern binds the subject to the name on the right
of the as
keyword and succeeds.
If no pattern to match is given, the wildcard pattern (__
) is implied.
To avoid confusion with the wildcard pattern, the double underscore (__
)
is not permitted as a capture target (this is what !"__"
expresses).
A capture pattern always succeeds. It binds the subject value to the
name using the scoping rules for name binding established for named expressions
in PEP 572. (Summary: the name becomes a local
variable in the closest containing function scope unless there’s an
applicable nonlocal
or global
statement.)
In a given pattern, a given name may be bound only once. This
disallows for example case [as x, as x]: ...
but allows
case [as x] | (as x)
:
As an open pattern, the usage of AS patterns is limited to top level case
clauses and when parenthesised in a group pattern. However, several of the
structural constraints allow the use of pattern_as_clause
in relevant
locations to bind extracted elements of the matched subject to local variables.
These are mostly represented in the abstract syntax tree as MatchAs
nodes,
aside from the dedicated MatchRestOfSequence
node in sequence patterns.
OR Patterns
Surface syntax:
or_pattern: '|'.simple_pattern+
simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
| closed_pattern
| value_constraint
Abstract syntax:
MatchOr(pattern* patterns)
When two or more patterns are separated by vertical bars (|
),
this is called an OR pattern. (A single simple pattern is just that)
Only the final subpattern may be irrefutable.
Each subpattern must bind the same set of names.
An OR pattern matches each of its subpatterns in turn to the subject, until one succeeds. The OR pattern is then deemed to succeed. If none of the subpatterns succeed the OR pattern fails.
Subpatterns are mostly required to be closed patterns, but the parentheses may be omitted for value constraints.
Value constraints
Surface syntax:
value_constraint:
| eq_constraint
| id_constraint
eq_constraint: '==' closed_expr
id_constraint: 'is' closed_expr
closed_expr: # Require a single token or a closing delimiter in expression
| primary
| closed_factor
closed_factor: # "factor" is the main grammar node for these unary ops
| '+' primary
| '-' primary
| '~' primary
Abstract syntax:
MatchValue(matchop op, expr value)
matchop = EqCheck | IdCheck
The rule primary
is defined in the standard Python grammar, and only
allows expressions that either consist of a single token, or else are required
to end with a closing delimiter.
Value constraints replace PEP 634’s literal patterns and value patterns.
Equality constraints are written as == EXPR
, while identity constraints are
written as is EXPR
.
An equality constraint succeeds if the subject value compares equal to the value given on the right, while an identity constraint succeeds only if they are the exact same object.
The expressions to be compared against are largely restricted to either single tokens (e.g. names, strings, numbers, builtin constants), or else to expressions that are required to end with a closing delimiter.
The use of the high precedence unary operators is also permitted, as the risk of perceived ambiguity is low, and being able to specify negative numbers without parentheses is desirable.
When the same constraint expression occurs multiple times in the same match statement, the interpreter may cache the first value calculated and reuse it, rather than repeat the expression evaluation. (As for PEP 634 value patterns, this cache is strictly tied to a given execution of a given match statement.)
Unlike literal patterns in PEP 634, this PEP requires that complex literals be parenthesised to be accepted by the parser. See the Deferred Ideas section for discussion on that point.
If this PEP were to be adopted in preference to PEP 634, then all literal and value patterns would instead be written more explicitly as value constraints:
# Literal patterns
match number:
case == 0:
print("Nothing")
case == 1:
print("Just one")
case == 2:
print("A couple")
case == -1:
print("One less than nothing")
case == (1-1j):
print("Good luck with that...")
# Additional literal patterns
match value:
case == True:
print("True or 1")
case == False:
print("False or 0")
case == None:
print("None")
case == "Hello":
print("Text 'Hello'")
case == b"World!":
print("Binary 'World!'")
# Matching by identity rather than equality
SENTINEL = object()
match value:
case is True:
print("True, not 1")
case is False:
print("False, not 0")
case is None:
print("None, following PEP 8 comparison guidelines")
case is ...:
print("May be useful when writing __getitem__ methods?")
case is SENTINEL:
print("Matches the sentinel by identity, not just value")
# Matching against variables and attributes
from enum import Enum
class Sides(str, Enum):
SPAM = "Spam"
EGGS = "eggs"
...
preferred_side = Sides.EGGS
match entree[-1]:
case == Sides.SPAM: # Compares entree[-1] == Sides.SPAM.
response = "Have you got anything without Spam?"
case == preferred_side: # Compares entree[-1] == preferred_side
response = f"Oh, I love {preferred_side}!"
case as side: # Assigns side = entree[-1].
response = f"Well, could I have their Spam instead of the {side} then?"
Note the == preferred_side
example: using an explicit prefix marker on
constraint expressions removes the restriction to only working with attributes
or literals for value lookups.
The == (1-1j)
example illustrates the use of parentheses to turn any
subexpression into a closed one.
Wildcard Pattern
Surface syntax:
wildcard_pattern: "__"
Abstract syntax:
MatchAlways
A wildcard pattern always succeeds. As in PEP 634, it binds no name.
Where PEP 634 chooses the single underscore as its wildcard pattern for
consistency with other languages, this PEP chooses the double underscore as that
has a clearer path towards potentially being made consistent across the entire
language, whereas that path is blocked for "_"
by i18n related use cases.
Example usage:
match sequence:
case [__]: # any sequence with a single element
return True
case [start, *__, end]: # a sequence with at least two elements
return start == end
case __: # anything
return False
Group Patterns
Surface syntax:
group_pattern: '(' open_pattern ')'
For the syntax of open_pattern
, see Patterns above.
A parenthesized pattern has no additional syntax and is not represented in the abstract syntax tree. It allows users to add parentheses around patterns to emphasize the intended grouping, and to allow nesting of open patterns when the grammar requires a closed pattern.
Unlike PEP 634, there is no potential ambiguity with sequence patterns, as this PEP requires that all sequence patterns be written with square brackets.
Structural constraints
Surface syntax:
structural_constraint:
| sequence_constraint
| mapping_constraint
| attrs_constraint
| class_constraint
Note: the separate “structural constraint” subcategory isn’t used in the abstract syntax tree, it’s merely used as a convenient grouping node in the surface syntax definition.
Structural constraints are patterns used to both make assertions about complex objects and to extract values from them.
These patterns may all bind multiple values, either through the use of nested
AS patterns, or else through the use of pattern_as_clause
elements included
in the definition of the pattern.
Sequence constraints
Surface syntax:
sequence_constraint: '[' [sequence_constraint_elements] ']'
sequence_constraint_elements: ','.sequence_constraint_element+ ','?
sequence_constraint_element:
| star_pattern
| simple_pattern
| pattern_as_clause
star_pattern: '*' (pattern_as_clause | wildcard_pattern)
simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
| closed_pattern
| value_constraint
pattern_as_clause: 'as' pattern_capture_target
Abstract syntax:
MatchSequence(pattern* patterns)
MatchRestOfSequence(identifier? target)
Sequence constraints allow items within a sequence to be checked and optionally extracted.
A sequence pattern fails if the subject value is not an instance of
collections.abc.Sequence
. It also fails if the subject value is
an instance of str
, bytes
or bytearray
(see Deferred Ideas for
a discussion on potentially removing the need for this special casing).
A sequence pattern may contain at most one star subpattern. The star
subpattern may occur in any position and is represented in the AST using the
MatchRestOfSequence
node.
If no star subpattern is present, the sequence pattern is a fixed-length sequence pattern; otherwise it is a variable-length sequence pattern.
A fixed-length sequence pattern fails if the length of the subject sequence is not equal to the number of subpatterns.
A variable-length sequence pattern fails if the length of the subject sequence is less than the number of non-star subpatterns.
The length of the subject sequence is obtained using the builtin
len()
function (i.e., via the __len__
protocol). However, the
interpreter may cache this value in a similar manner as described for
value constraint expressions.
A fixed-length sequence pattern matches the subpatterns to corresponding items of the subject sequence, from left to right. Matching stops (with a failure) as soon as a subpattern fails. If all subpatterns succeed in matching their corresponding item, the sequence pattern succeeds.
A variable-length sequence pattern first matches the leading non-star subpatterns to the corresponding items of the subject sequence, as for a fixed-length sequence. If this succeeds, the star subpattern matches a list formed of the remaining subject items, with items removed from the end corresponding to the non-star subpatterns following the star subpattern. The remaining non-star subpatterns are then matched to the corresponding subject items, as for a fixed-length sequence.
Subpatterns are mostly required to be closed patterns, but the parentheses may be omitted for value constraints. Sequence elements may also be captured unconditionally without parentheses.
Note: where PEP 634 allows all the same syntactic flexibility as iterable unpacking in assignment statements, this PEP restricts sequence patterns specifically to the square bracket form. Given that the open and parenthesised forms are far more popular than square brackets for iterable unpacking, this helps emphasise that iterable unpacking and sequence matching are not the same operation. It also avoids the parenthesised form’s ambiguity problem between single element sequence patterns and group patterns.
Mapping constraints
Surface syntax:
mapping_constraint: '{' [mapping_constraint_elements] '}'
mapping_constraint_elements: ','.key_value_constraint+ ','?
key_value_constraint:
| closed_expr pattern_as_clause
| closed_expr ':' simple_pattern
| double_star_capture
double_star_capture: '**' pattern_as_clause
(Note that **__
is deliberately disallowed by this syntax, as additional
mapping entries are ignored by default)
closed_expr is defined above, under value constraints.
Abstract syntax:
MatchMapping(expr* keys, pattern* patterns)
Mapping constraints allow keys and values within a sequence to be checked and values to optionally be extracted.
A mapping pattern fails if the subject value is not an instance of
collections.abc.Mapping
.
A mapping pattern succeeds if every key given in the mapping pattern is present in the subject mapping, and the pattern for each key matches the corresponding item of the subject mapping.
The presence of keys is checked using the two argument form of the get
method and a unique sentinel value, which offers the following benefits:
- no exceptions need to be created in the lookup process
- mappings that implement
__missing__
(such ascollections.defaultdict
) only match on keys that they already contain, they don’t implicitly add keys
A mapping pattern may not contain duplicate key values. If duplicate keys are
detected when checking the mapping pattern, the pattern is considered invalid,
and a ValueError
is raised. While it would theoretically be possible to
checked for duplicated constant keys at compile time, no such check is currently
defined or implemented.
(Note: This semantic description is derived from the PEP 634 reference implementation, which differs from the PEP 634 specification text at time of writing. The implementation seems reasonable, so amending the PEP text seems like the best way to resolve the discrepancy)
If a '**' as NAME
double star pattern is present, that name is bound to a
dict
containing any remaining key-value pairs from the subject mapping
(the dict will be empty if there are no additional key-value pairs).
A mapping pattern may contain at most one double star pattern, and it must be last.
Value subpatterns are mostly required to be closed patterns, but the parentheses
may be omitted for value constraints (the :
key/value separator is still
required to ensure the entry doesn’t look like an ordinary comparison operation).
Mapping values may also be captured unconditionally using the KEY as NAME
form, without either parentheses or the :
key/value separator.
Instance attribute constraints
Surface syntax:
attrs_constraint:
| name_or_attr '{' [attrs_constraint_elements] '}'
attrs_constraint_elements: ','.attr_value_pattern+ ','?
attr_value_pattern:
| '.' NAME pattern_as_clause
| '.' NAME value_constraint
| '.' NAME ':' simple_pattern
| '.' NAME
Abstract syntax:
MatchAttrs(expr cls, identifier* attrs, pattern* patterns)
Instance attribute constraints allow an instance’s type to be checked and attributes to optionally be extracted.
An instance attribute constraint may not repeat the same attribute name multiple times. Attempting to do so will result in a syntax error.
An instance attribute pattern fails if the subject is not an instance of
name_or_attr
. This is tested using isinstance()
.
If name_or_attr
is not an instance of the builtin type
,
TypeError
is raised.
If no attribute subpatterns are present, the constraint succeeds if the
isinstance()
check succeeds. Otherwise:
- Each given attribute name is looked up as an attribute on the subject.
- If this raises an exception other than
AttributeError
, the exception bubbles up.- If this raises
AttributeError
the constraint fails.- Otherwise, the subpattern associated with the keyword is matched against the attribute value. If no subpattern is specified, the wildcard pattern is assumed. If this fails, the constraint fails. If it succeeds, the match proceeds to the next attribute.
- If all attribute subpatterns succeed, the constraint as a whole succeeds.
Instance attribute constraints allow ducktyping checks to be implemented by
using object
as the required instance type (e.g.
case object{.host as host, .port as port}:
).
The syntax being proposed here could potentially also be used as the basis for
a new syntax for retrieving multiple attributes from an object instance in one
assignment statement (e.g. host, port = addr{.host, .port}
). See the
Deferred Ideas section for further discussion of this point.
Class defined constraints
Surface syntax:
class_constraint:
| name_or_attr '(' ')'
| name_or_attr '(' positional_patterns ','? ')'
| name_or_attr '(' class_constraint_attrs ')'
| name_or_attr '(' positional_patterns ',' class_constraint_attrs] ')'
positional_patterns: ','.positional_pattern+
positional_pattern:
| simple_pattern
| pattern_as_clause
class_constraint_attrs:
| '**' '{' [attrs_constraint_elements] '}'
Abstract syntax:
MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns)
Class defined constraints allow a sequence of common attributes to be specified on a class and checked positionally, rather than needing to specify the attribute names in every related match pattern.
As for instance attribute patterns:
- a class defined pattern fails if the subject is not an instance of
name_or_attr
. This is tested usingisinstance()
. - if
name_or_attr
is not an instance of the builtintype
,TypeError
is raised.
Regardless of whether or not any arguments are present, the subject is checked
for a __match_args__
attribute using the equivalent of
getattr(cls, "__match_args__", _SENTINEL))
.
If this raises an exception the exception bubbles up.
If the returned value is not a list, tuple, or None
, the conversion fails
and TypeError
is raised at runtime.
This means that only types that actually define __match_args__
will be
usable in class defined patterns. Types that don’t define __match_args__
will still be usable in instance attribute patterns.
If __match_args__
is None
, then only a single positional subpattern is
permitted. Attempting to specify additional attribute patterns either
positionally or using the double star syntax will cause TypeError
to be
raised at runtime.
This positional subpattern is then matched against the entire subject, allowing a type check to be combined with another match pattern (e.g. checking both the type and contents of a container, or the type and value of a number).
If __match_args__
is a list or tuple, then the class defined constraint is
converted to an instance attributes constraint as follows:
- if only the double star attribute constraints subpattern is present, matching proceeds as if for the equivalent instance attributes constraint.
- if there are more positional subpatterns than the length of
__match_args__
(as obtained usinglen()
),TypeError
is raised. - Otherwise, positional pattern
i
is converted to an attribute pattern using__match_args__[i]
as the attribute name. - if any element in
__match_args__
is not a string,TypeError
is raised. - once the positional patterns have been converted to attribute patterns, then they are combined with any attribute constraints given in the double star attribute constraints subpattern, and matching proceeds as if for the equivalent instance attributes constraint.
Note: the __match_args__ is None
handling in this PEP replaces the special
casing of bool
, bytearray
, bytes
, dict
, float
,
frozenset
, int
, list
, set
, str
, and tuple
in PEP 634.
However, the optimised fast path for those types is retained in the
implementation.
Design Discussion
Requiring explicit qualification of simple names in match patterns
The first iteration of this PEP accepted the basic premise of PEP 634 that iterable unpacking syntax would provide a good foundation for defining a new syntax for pattern matching.
During the review process, however, two major and one minor ambiguity problems were highlighted that arise directly from that core assumption:
- most problematically, when binding simple names by default is extended to
PEP 634’s proposed class pattern syntax, the
ATTR=TARGET_NAME
construct binds to the right without using theas
keyword, and uses the normal assignment-to-the-left sigil (=
) to do it! - when binding simple names by default is extended to PEP 634’s proposed mapping
pattern syntax, the
KEY: TARGET_NAME
construct binds to the right without using theas
keyword - using a PEP 634 capture pattern together with an AS pattern
(
TARGET_NAME_1 as TARGET_NAME_2
) gives an odd “binds to both the left and right” behaviour
The third revision of this PEP accounted for this problem by abandoning the alignment with iterable unpacking syntax, and instead requiring that all uses of bare simple names for anything other than a variable lookup be qualified by a preceding sigil or keyword:
as NAME
: local variable binding.NAME
: attribute lookup== NAME
: variable lookupis NAME
: variable lookup- any other usage: variable lookup
The key benefit of this approach is that it makes interpretation of simple names
in patterns a local activity: a leading as
indicates a name binding, a
leading .
indicates an attribute lookup, and anything else is a variable
lookup (regardless of whether we’re reading a subpattern or a subexpression).
With the syntax now proposed in this PEP, the problematic cases identified above no longer read poorly:
.ATTR as TARGET_NAME
is more obviously a binding thanATTR=TARGET_NAME
KEY as TARGET_NAME
is more obviously a binding thanKEY: TARGET_NAME
(as TARGET_NAME_1) as TARGET_NAME_2
is more obviously two bindings thanTARGET_NAME_1 as TARGET_NAME_2
Resisting the temptation to guess
PEP 635 looks at the way pattern matching is used in other languages, and attempts to use that information to make plausible predictions about the way pattern matching will be used in Python:
- wanting to extract values to local names will probably be more common than wanting to match against values stored in local names
- wanting comparison by equality will probably be more common than wanting comparison by identity
- users will probably be able to at least remember that bare names bind values and attribute references look up values, even if they can’t figure that out for themselves without reading the documentation or having someone tell them
To be clear, I think these predictions actually are plausible. However, I also
don’t think we need to guess about this up front: I think we can start out with
a more explicit syntax that requires users to state their intent using a prefix
marker (either as
, ==
, or is
), and then reassess the situation in a
few years based on how pattern matching is actually being used in Python.
At that point, we’ll be able to choose amongst at least the following options:
- deciding the explicit syntax is concise enough, and not changing anything
- adding inferred identity constraints for one or more of
None
,...
,True
andFalse
- adding inferred equality constraints for other literals (potentially including complex literals)
- adding inferred equality constraints for attribute lookups
- adding either inferred equality constraints or inferred capture patterns for bare names
All of those ideas could be considered independently on their own merits, rather than being a potential barrier to introducing pattern matching in the first place.
If any of these syntactic shortcuts were to eventually be introduced, they’d
also be straightforward to explain in terms of the underlying more explicit
syntax (the leading as
, ==
, or is
would just be getting inferred
by the parser, without the user needing to provide it explicitly). At the
implementation level, only the parser should need to be change, as the existing
AST nodes could be reused.
Interaction with caching of attribute lookups in local variables
One of the major changes between this PEP and PEP 634 is to use == EXPR
for equality constraint lookups, rather than only offering NAME.ATTR
. The
original motivation for this was to avoid the semantic conflict with regular
assignment targets, where NAME.ATTR
is already used in assignment statements
to set attributes, so if NAME.ATTR
were the only syntax for symbolic value
matching, then we’re pre-emptively ruling out any future attempts to allow
matching against single patterns using the existing assignment statement syntax.
The current motivation is more about the general desire to avoid guessing about
user’s intent, and instead requiring them to state it explicitly in the syntax.
However, even within match statements themselves, the name.attr
syntax for
value patterns has an undesirable interaction with local variable assignment,
where routine refactorings that would be semantically neutral for any other
Python statement introduce a major semantic change when applied to a PEP 634
style match statement.
Consider the following code:
while value < self.limit:
... # Some code that adjusts "value"
The attribute lookup can be safely lifted out of the loop and only performed once:
_limit = self.limit:
while value < _limit:
... # Some code that adjusts "value"
With the marker prefix based syntax proposal in this PEP, value constraints would be similarly tolerant of match patterns being refactored to use a local variable instead of an attribute lookup, with the following two statements being functionally equivalent:
match expr:
case {"key": == self.target}:
... # Handle the case where 'expr["key"] == self.target'
case __:
... # Handle the non-matching case
_target = self.target
match expr:
case {"key": == _target}:
... # Handle the case where 'expr["key"] == self.target'
case __:
... # Handle the non-matching case
By contrast, when using PEP 634’s value and capture pattern syntaxes that omit the marker prefix, the following two statements wouldn’t be equivalent at all:
# PEP 634's value pattern syntax
match expr:
case {"key": self.target}:
... # Handle the case where 'expr["key"] == self.target'
case _:
... # Handle the non-matching case
# PEP 634's capture pattern syntax
_target = self.target
match expr:
case {"key": _target}:
... # Matches any mapping with "key", binding its value to _target
case _:
... # Handle the non-matching case
This PEP ensures the original semantics are retained under this style of
simplistic refactoring: use == name
to force interpretation of the result
as a value constraint, use as name
for a name binding.
PEP 634’s proposal to offer only the shorthand syntax, with no explicitly prefixed form, means that the primary answer on offer is “Well, don’t do that, then, only compare against attributes in namespaces, don’t compare against simple names”.
PEP 622’s walrus pattern syntax had another odd interaction where it might not bind the same object as the exact same walrus expression in the body of the case clause, but PEP 634 fixed that discrepancy by replacing walrus patterns with AS patterns (where the fact that the value bound to the name on the RHS might not be the same value as returned by the LHS is a standard feature common to all uses of the “as” keyword).
Using existing comparison operators as the value constraint prefix
If the benefit of a dedicated value constraint prefix is accepted, then the next question is to ask exactly what that prefix should be.
The initially published version of this PEP proposed using the previously
unused ?
symbol as the prefix for equality constraints, and ?is
as the
prefix for identity constraints. When reviewing the PEP, Steven D’Aprano
presented a compelling counterproposal [5] to use the existing comparison
operators (==
and is
) instead.
There were a few concerns with ==
as a prefix that kept it from being
chosen as the prefix in the initial iteration of the PEP:
- for common use cases, it’s even more visually noisy than
?
, as a lot of folks with PEP 8 trained aesthetic sensibilities are going to want to put a space between it and the following expression, effectively making it a 3 character prefix instead of 1 - when used in a mapping pattern, there needs to be a space between the
:
key/value separator and the==
prefix, or the tokeniser will split them up incorrectly (getting:=
and=
instead of:
and==
) - when used in an OR pattern, there needs to be a space between the
|
pattern separator and the==
prefix, or the tokeniser will split them up incorrectly (getting|=
and=
instead of|
and==
) - if used in a PEP 634 style class pattern, there needs to be a space between
the
=
keyword separator and the==
prefix, or the tokeniser will split them up incorrectly (getting==
and=
instead of=
and==
)
Rather than introducing a completely new symbol, Steven’s proposed resolution to this verbosity problem was to retain the ability to omit the prefix marker in syntactically unambiguous cases.
While the idea of omitting the prefix marker was accepted for the second revision of the proposal, it was dropped again in the third revision due to ambiguity concerns. Instead, the following points apply:
- for class patterns, other syntax changes allow equality constraints to be
written as
.ATTR == EXPR
, and identity constraints to be written as.ATTR is EXPR
, both of which are quite easy to read - for mapping patterns, the extra syntactic noise is just tolerated (at least for now)
- for OR patterns, the extra syntactic noise is just tolerated (at least for now). However, membership constraints may offer a future path to reducing the need to combine OR patterns with equality constraints (instead, the values to be checked against would be collected as a set, list, or tuple).
Given that perspective, PEP 635’s arguments against using ?
as part of the
pattern matching syntax held for this proposal as well, and so the PEP was
amended accordingly.
Using __
as the wildcard pattern marker
PEP 635 makes a solid case that introducing ?
solely as a wildcard pattern
marker would be a bad idea. With the syntax for value constraints changed
to use existing comparison operations rather than ?
and ?is
, that
argument holds for this PEP as well.
However, as noted by Thomas Wouters in [6], PEP 634’s choice of _
remains
problematic as it would likely mean that match patterns would have a permanent
difference from all other parts of Python - the use of _
in software
internationalisation and at the interactive prompt means that there isn’t really
a plausible path towards using it as a general purpose “skipped binding” marker.
__
is an alternative “this value is not needed” marker drawn from a Stack
Overflow answer [7] (originally posted by the author of this PEP) on the
various meanings of _
in existing Python code.
This PEP also proposes adopting an implementation technique that limits
the scope of the associated special casing of __
to the parser: defining a
new AST node type (MatchAlways
) specifically for wildcard markers, rather
than passing it through to the AST as a Name
node.
Within the parser, __
still means either a regular name or a wildcard
marker in a match pattern depending on where you were in the parse tree, but
within the rest of the compiler, Name("__")
is still a normal variable name,
while MatchAlways()
is always a wildcard marker in a match pattern.
Unlike _
, the lack of other use cases for __
means that there would be
a plausible path towards restoring identifier handling consistency with the rest
of the language by making __
mean “skip this name binding” everywhere in
Python:
- in the interpreter itself, deprecate loading variables with the name
__
. This would make reading from__
emit a deprecation warning, while writing to it would initially be unchanged. To avoid slowing down all name loads, this could be handled by having the compiler emit additional code for the deprecated name, rather than using a runtime check in the standard name loading opcodes. - after a suitable number of releases, change the parser to emit
a new
SkippedBinding
AST node for all uses of__
as an assignment target, and update the rest of the compiler accordingly - consider making
__
a true hard keyword rather than a soft keyword
This deprecation path couldn’t be followed for _
, as there’s no way for the
interpreter to distinguish between attempts to read back _
when nominally
used as a “don’t care” marker, and legitimate reads of _
as either an
i18n text translation function or as the last statement result at the
interactive prompt.
Names starting with double-underscores are also already reserved for use by the
language, whether that is for compile time constants (i.e. __debug__
),
special methods, or class attribute name mangling, so using __
here would
be consistent with that existing approach.
Representing patterns explicitly in the Abstract Syntax Tree
PEP 634 doesn’t explicitly discuss how match statements should be represented in the Abstract Syntax Tree, instead leaving that detail to be defined as part of the implementation.
As a result, while the reference implementation of PEP 634 definitely works (and formed the basis of the reference implementation of this PEP), it does contain a significant design flaw: despite the notes in PEP 635 that patterns should be considered as distinct from expressions, the reference implementation goes ahead and represents them in the AST as expression nodes.
The result is an AST that isn’t very abstract at all: nodes that should be compiled completely differently (because they’re patterns rather than expressions) are represented the same way, and the type system of the implementation language (e.g. C for CPython) can’t offer any assistance in keeping track of which subnodes should be ordinary expressions and which should be subpatterns.
Rather than continuing with that approach, this PEP has instead defined a new explicit “pattern” node in the AST, which allows the patterns and their permitted subnodes to be defined explicitly in the AST itself, making the code implementing the new feature clearer, and allowing the C compiler to provide more assistance in keeping track of when the code generator is dealing with patterns or expressions.
This change in implementation approach is actually orthogonal to the surface syntax changes proposed in this PEP, so it could still be adopted even if the rest of the PEP were to be rejected.
Changes to sequence patterns
This PEP makes one notable change to sequence patterns relative to PEP 634:
- only the square bracket form of sequence pattern is supported. Neither open (no delimiters) nor tuple style (parentheses as delimiters) sequence patterns are supported.
Relative to PEP 634, sequence patterns are also significantly affected by the
change to require explicit qualification of capture patterns and value
constraints, as it means case [a, b, c]:
must instead be written as
case [as a, as b, as c]:
and case [0, 1]:
must instead be written as
case [== 0, == 1]:
.
With the syntax for sequence patterns no longer being derived directly from the syntax for iterable unpacking, it no longer made sense to keep the syntactic flexibility that had been included in the original syntax proposal purely for consistency with iterable unpacking.
Allowing open and tuple style sequence patterns didn’t increase expressivity, only ambiguity of intent (especially relative to group patterns), and encouraged readers down the path of viewing pattern matching syntax as intrinsically linked to assignment target syntax (which the PEP 634 authors have stated multiple times is not a desirable path to have readers take, and a view the author of this PEP now shares, despite disagreeing with it originally).
Changes to mapping patterns
This PEP makes two notable changes to mapping patterns relative to PEP 634:
- value capturing is written as
KEY as NAME
rather than asKEY: NAME
- a wider range of keys are permitted: any “closed expression”, rather than only literals and attribute references
As discussed above, the first change is part of ensuring that all binding
operations with the target name to the right of a subexpression or pattern
use the as
keyword.
The second change is mostly a matter of simplifying the parser and code generator code by reusing the existing expression handling machinery. The restriction to closed expressions is designed to help reduce ambiguity as to where the key expression ends and the match pattern begins. This mostly allows a superset of what PEP 634 allows, except that complex literals must be written in parentheses (at least for now).
Adapting PEP 635’s mapping pattern examples to the syntax proposed in this PEP:
match json_pet:
case {"type": == "cat", "name" as name, "pattern" as pattern}:
return Cat(name, pattern)
case {"type": == "dog", "name" as name, "breed" as breed}:
return Dog(name, breed)
case __:
raise ValueError("Not a suitable pet")
def change_red_to_blue(json_obj):
match json_obj:
case { 'color': (== 'red' | == '#FF0000') }:
json_obj['color'] = 'blue'
case { 'children' as children }:
for child in children:
change_red_to_blue(child)
For reference, the equivalent PEP 634 syntax:
match json_pet:
case {"type": "cat", "name": name, "pattern": pattern}:
return Cat(name, pattern)
case {"type": "dog", "name": name, "breed": breed}:
return Dog(name, breed)
case _:
raise ValueError("Not a suitable pet")
def change_red_to_blue(json_obj):
match json_obj:
case { 'color': ('red' | '#FF0000') }:
json_obj['color'] = 'blue'
case { 'children': children }:
for child in children:
change_red_to_blue(child)
Changes to class patterns
This PEP makes several notable changes to class patterns relative to PEP 634:
- the syntactic alignment with class instantiation is abandoned as being actively misleading and unhelpful. Instead, a new dedicated syntax for checking additional attributes is introduced that draws inspiration from mapping patterns rather than class instantiation
- a new dedicated syntax for simple ducktyping that will work for any class is introduced
- the special casing of various builtin and standard library types is
supplemented by a general check for the existence of a
__match_args__
attribute with the value ofNone
As discussed above, the first change has two purposes:
- it’s part of ensuring that all binding operations with the target name to the
right of a subexpression or pattern use the
as
keyword. Using=
to assign to the right is particularly problematic. - it’s part of ensuring that all uses of simple names in patterns have a prefix
that indicates their purpose (in this case, a leading
.
to indicate an attribute lookup)
The syntactic alignment with class instantion was also judged to be unhelpful in general, as class patterns are about matching patterns against attributes, while class instantiation is about matching call arguments to parameters in class constructors, which may not bear much resemblance to the resulting instance attributes at all.
The second change is intended to make it easier to use pattern matching for the “ducktyping” style checks that are already common in Python.
The concrete syntax proposal for these patterns then arose from viewing
instances as mappings of attribute names to values, and combining the attribute
lookup syntax (.ATTR
), with the mapping pattern syntax {KEY: PATTERN}
to give cls{.ATTR: PATTERN}
.
Allowing cls{.ATTR}
to mean the same thing as cls{.ATTR: __}
was a
matter of considering the leading .
sufficient to render the name usage
unambiguous (it’s clearly an attribute reference, whereas matching against a variable
key in a mapping pattern would be arguably ambiguous)
The final change just supplements a CPython-internal-only check in the PEP 634
reference implementation by making it the default behaviour that classes get if
they don’t define __match_args__
(the optimised fast path for the builtin
and standard library types named in PEP 634 is retained).
Adapting the class matching example linked from PEP 635 shows that for purely positional class matching, the main impact comes from the changes to value constraints and name binding, not from the class matching changes:
match expr:
case BinaryOp(== '+', as left, as right):
return eval_expr(left) + eval_expr(right)
case BinaryOp(== '-', as left, as right):
return eval_expr(left) - eval_expr(right)
case BinaryOp(== '*', as left, as right):
return eval_expr(left) * eval_expr(right)
case BinaryOp(== '/', as left, as right):
return eval_expr(left) / eval_expr(right)
case UnaryOp(== '+', as arg):
return eval_expr(arg)
case UnaryOp(== '-', as arg):
return -eval_expr(arg)
case VarExpr(as name):
raise ValueError(f"Unknown value of: {name}")
case float() | int():
return expr
case __:
raise ValueError(f"Invalid expression value: {repr(expr)}")
For reference, the equivalent PEP 634 syntax:
match expr:
case BinaryOp('+', left, right):
return eval_expr(left) + eval_expr(right)
case BinaryOp('-', left, right):
return eval_expr(left) - eval_expr(right)
case BinaryOp('*', left, right):
return eval_expr(left) * eval_expr(right)
case BinaryOp('/', left, right):
return eval_expr(left) / eval_expr(right)
case UnaryOp('+', arg):
return eval_expr(arg)
case UnaryOp('-', arg):
return -eval_expr(arg)
case VarExpr(name):
raise ValueError(f"Unknown value of: {name}")
case float() | int():
return expr
case _:
raise ValueError(f"Invalid expression value: {repr(expr)}")
The changes to the class pattern syntax itself are more relevant when
checking for named attributes and extracting their values without relying on
__match_args__
:
match expr:
case object{.host as host, .port as port}:
pass
case object{.host as host}:
pass
Compare this to the PEP 634 equivalent, where it really isn’t clear which names are referring to attributes of the match subject and which names are referring to local variables:
match expr:
case object(host=host, port=port):
pass
case object(host=host):
pass
In this specific case, that ambiguity doesn’t matter (since the attribute and variable names are the same), but in the general case, knowing which is which will be critical to reasoning correctly about the code being read.
Deferred Ideas
Inferred value constraints
As discussed above, this PEP doesn’t rule out the possibility of adding inferred equality and identity constraints in the future.
These could be particularly valuable for literals, as it is quite likely that
many “magic” strings and numbers with self-evident meanings will be written
directly into match patterns, rather than being stored in named variables.
(Think constants like None
, or obviously special numbers like 0
and
1
, or strings where their contents are as descriptive as any variable name,
rather than cryptic checks against opaque numbers like 739452
)
Making some required parentheses optional
The PEP currently errs heavily on the side of requiring parentheses in the face of potential ambiguity.
However, there are a number of cases where it at least arguably goes too far, mostly involving AS patterns with an explicit pattern.
In any position that requires a closed pattern, AS patterns may end up starting
with doubled parentheses, as the nested pattern is also required to be a closed
pattern: ((OPEN PTRN) as NAME)
Due to the requirement that the subpattern be closed, it should be reasonable
in many of these cases (e.g. sequence pattern subpatterns) to accept
CLOSED_PTRN as NAME
directly.
Further consideration of this point has been deferred, as making required parentheses optional is a backwards compatible change, and hence relaxing the restrictions later can be considered on a case-by-case basis.
Accepting complex literals as closed expressions
PEP 634’s reference implementation includes a lot of special casing of binary operations in both the parser and the rest of the compiler in order to accept complex literals without accepting arbitrary binary numeric operations on literal values.
Ideally, this problem would be dealt with at the parser layer, with the parser directly emitting a Constant AST node prepopulated with a complex number. If that was the way things worked, then complex literals could be accepted through a similar mechanism to any other literal.
This isn’t how complex literals are handled, however. Instead, they’re passed
through to the AST as regular BinOp
nodes, and then the constant folding
pass on the AST resolves them down to Constant
nodes with a complex value.
For the parser to resolve complex literals directly, the compiler would need to
be able to tell the tokenizer to generate a distinct token type for
imaginary numbers (e.g. INUMBER
), which would then allow the parser to
handle NUMBER + INUMBER
and NUMBER - INUMBER
separately from other
binary operations.
Alternatively, a new ComplexNumber
AST node type could be defined, which
would allow the parser to notify the subsequent compiler stages that a
particular node should specifically be a complex literal, rather than an
arbitrary binary operation. Then the parser could accept NUMBER + NUMBER
and NUMBER - NUMBER
for that node, while letting the AST validation for
ComplexNumber
take care of ensuring that the real and imaginary parts of
the literal were real and imaginary numbers as expected.
For now, this PEP has postponed dealing with this question, and instead just requires that complex literals be parenthesised in order to be used in value constraints and as mapping pattern keys.
Allowing negated constraints in match patterns
With the syntax proposed in this PEP, it isn’t permitted to write != expr
or is not expr
as a match pattern.
Both of these forms have clear potential interpretations as a negated equality
constraint (i.e. x != expr
) and a negated identity constraint
(i.e. x is not expr
).
However, it’s far from clear either form would come up often enough to justify the dedicated syntax, so the possible extension has been deferred pending further community experience with match statements.
Allowing membership checks in match patterns
The syntax used for equality and identity constraints would be straightforward
to extend to membership checks: in container
.
One downside of the proposals in both this PEP and PEP 634 is that checking for multiple values in the same case doesn’t look like any existing container membership check in Python:
# PEP 634's literal patterns
match value:
case 0 | 1 | 2 | 3:
...
# This PEP's equality constraints
match value:
case == 0 | == 1 | == 2 | == 3:
...
Allowing inferred equality constraints under this PEP would only make it look
like the PEP 634 example, it still wouldn’t look like the equivalent if
statement header (if value in {0, 1, 2, 3}:
).
Membership constraints would provide a more explicit, but still concise, way to check if the match subject was present in a container, and it would look the same as an ordinary containment check:
match value:
case in {0, 1, 2, 3}:
...
case in {one, two, three, four}:
...
case in range(4): # It would accept any container, not just literal sets
...
Such a feature would also be readily extensible to allow all kinds of case
clauses without any further syntax updates, simply by defining __contains__
appropriately on a custom class definition.
However, while this does seem like a useful extension, and a good way to resolve this PEP’s verbosity problem when combining multiple equality checks in an OR pattern, it isn’t essential to making match statements a valuable addition to the language, so it seems more appropriate to defer it to a separate proposal, rather than including it here.
Inferring a default type for instance attribute constraints
The dedicated syntax for instance attribute constraints means that object
could be omitted from object{.ATTR}
to give {.ATTR}
without introducing
any syntactic ambiguity (if no class was given, object
would be implied,
just as it is for the base class list in class definitions).
However, it’s far from clear saving six characters is worth making it harder to visually distinguish mapping patterns from instance attribute patterns, so allowing this has been deferred as a topic for possible future consideration.
Avoiding special cases in sequence patterns
Sequence patterns in both this PEP and PEP 634 currently special case str
,
bytes
, and bytearray
as specifically never matching a sequence
pattern.
This special casing could potentially be removed if we were to define a new
collections.abc.AtomicSequence
abstract base class for types like these,
where they’re conceptually a single item, but still implement the sequence
protocol to allow random access to their component parts.
Expression syntax to retrieve multiple attributes from an instance
The instance attribute pattern syntax has been designed such that it could be used as the basis for a general purpose syntax for retrieving multiple attributes from an object in a single expression:
host, port = obj{.host, .port}
Similar to slice syntax only being allowed inside bracket subscrpts, the
.attr
syntax for naming attributes would only be allowed inside brace
subscripts.
This idea isn’t required for pattern matching to be useful, so it isn’t part of this PEP. However, it’s mentioned as a possible path towards making pattern matching feel more integrated into the rest of the language, rather than existing forever in its own completely separated world.
Expression syntax to retrieve multiple attributes from an instance
If the brace subscript syntax were to be accepted for instance attribute pattern matching, and then subsequently extended to offer general purpose extraction of multiple attributes, then it could be extended even further to allow for retrieval of multiple items from containers based on the syntax used for mapping pattern matching:
host, port = obj{"host", "port"}
first, last = obj{0, -1}
Again, this idea isn’t required for pattern matching to be useful, so it isn’t part of this PEP. As with retrieving multiple attributes, however, it is included as an example of the proposed pattern matching syntax inspiring ideas for making object deconstruction easier in general.
Rejected Ideas
Restricting permitted expressions in value constraints and mapping pattern keys
While it’s entirely technically possible to restrict the kinds of expressions
permitted in value constraints and mapping pattern keys to just attribute
lookups and constant literals (as PEP 634 does), there isn’t any clear runtime
value in doing so, so this PEP proposes allowing any kind of primary expression
(primary expressions are an existing node type in the grammar that includes
things like literals, names, attribute lookups, function calls, container
subscripts, parenthesised groups, etc), as well as high precedence unary
operations (+
, -
, ~
) on primary expressions.
While PEP 635 does emphasise several times that literal patterns and value patterns are not full expressions, it doesn’t ever articulate a concrete benefit that is obtained from that restriction (just a theoretical appeal to it being useful to separate static checks from dynamic checks, which a code style tool could still enforce, even if the compiler itself is more permissive).
The last time we imposed such a restriction was for decorator expressions and the primary outcome of that was that users had to put up with years of awkward syntactic workarounds (like nesting arbitrary expressions inside function calls that just returned their argument) to express the behaviour they wanted before the language definition was finally updated to allow arbitrary expressions and let users make their own decisions about readability.
The situation in PEP 634 that bears a resemblance to the situation with decorator expressions is that arbitrary expressions are technically supported in value patterns, they just require awkward workarounds where either all the values to match need to be specified in a helper class that is placed before the match statement:
# Allowing arbitrary match targets with PEP 634's value pattern syntax
class mt:
value = func()
match expr:
case (_, mt.value):
... # Handle the case where 'expr[1] == func()'
Or else they need to be written as a combination of a capture pattern and a guard expression:
# Allowing arbitrary match targets with PEP 634's guard expressions
match expr:
case (_, _matched) if _matched == func():
... # Handle the case where 'expr[1] == func()'
This PEP proposes skipping requiring any such workarounds, and instead supporting arbitrary value constraints from the start:
match expr:
case (__, == func()):
... # Handle the case where 'expr == func()'
Whether actually writing that kind of code is a good idea would be a topic for style guides and code linters, not the language compiler.
In particular, if static analysers can’t follow certain kinds of dynamic checks, then they can limit the permitted expressions at analysis time, rather than the compiler restricting them at compile time.
There are also some kinds of expressions that are almost certain to give
nonsensical results (e.g. yield
, yield from
, await
) due to the
pattern caching rule, where the number of times the constraint expression
actually gets evaluated will be implementation dependent. Even here, the PEP
takes the view of letting users write nonsense if they really want to.
Aside from the recenty updated decorator expressions, another situation where
Python’s formal syntax offers full freedom of expression that is almost never
used in practice is in except
clauses: the exceptions to match against
almost always take the form of a simple name, a dotted name, or a tuple of
those, but the language grammar permits arbitrary expressions at that point.
This is a good indication that Python’s user base can be trusted to
take responsibility for finding readable ways to use permissive language
features, by avoiding writing hard to read constructs even when they’re
permitted by the compiler.
This permissiveness comes with a real concrete benefit on the implementation side: dozens of lines of match statement specific code in the compiler is replaced by simple calls to the existing code for compiling expressions (including in the AST validation pass, the AST optimization pass, the symbol table analysis pass, and the code generation pass). This implementation benefit would accrue not just to CPython, but to every other Python implementation looking to add match statement support.
Requiring the use of constraint prefix markers for mapping pattern keys
The initial (unpublished) draft of this proposal suggested requiring mapping pattern keys be value constraints, just as PEP 634 requires that they be valid literal or value patterns:
import constants
match config:
case {== "route": route}:
process_route(route)
case {== constants.DEFAULT_PORT: sub_config, **rest}:
process_config(sub_config, rest)
However, the extra characters were syntactically noisy and unlike its use in value constraints (where it distinguishes them from non-pattern expressions), the prefix doesn’t provide any additional information here that isn’t already conveyed by the expression’s position as a key within a mapping pattern.
Accordingly, the proposal was simplified to omit the marker prefix from mapping pattern keys.
This omission also aligns with the fact that containers may incorporate both identity and equality checks into their lookup process - they don’t purely rely on equality checks, as would be incorrectly implied by the use of the equality constraint prefix.
Allowing the key/value separator to be omitted for mapping value constraints
Instance attribute patterns allow the :
separator to be omitted when
writing attribute value constraints like case object{.attr == expr}
.
Offering a similar shorthand for mapping value constraints was considered, but
permitting it allows thoroughly baffling constructs like case {0 == 0}:
where the compiler knows this is the key 0
with the value constraint
== 0
, but a human reader sees the tautological comparison operation
0 == 0
. With the key/value separator included, the intent is more obvious to
a human reader as well: case {0: == 0}:
Reference Implementation
A draft reference implementation for this PEP [3] has been derived from Brandt Bucher’s reference implementation for PEP 634 [4].
Relative to the text of this PEP, the draft reference implementation has not
yet complemented the special casing of several builtin and standard library
types in MATCH_CLASS
with the more general check for __match_args__
being set to None
. Class defined patterns also currently still accept
classes that don’t define __match_args__
.
All other modified patterns have been updated to follow this PEP rather than PEP 634.
Unparsing for match patterns has not yet been migrated to the updated v3 AST.
The AST validator for match patterns has not yet been implemented.
The AST validator in general has not yet been reviewed to ensure that it is checking that only expression nodes are being passed in where expression nodes are expected.
The examples in this PEP have not yet been converted to test cases, so could plausibly contain typos and other errors.
Several of the old PEP 634 tests are still to be converted to new SyntaxError tests.
The documentation has not yet been updated.
Acknowledgments
The PEP 622 and PEP 634/PEP 635/PEP 636 authors, as the proposal in this PEP is merely an attempt to improve the readability of an already well-constructed idea by proposing that starting with a more explicit syntax and potentially introducing syntactic shortcuts for particularly common operations later is a better option than attempting to only define the shortcut version. For areas of the specification where the two PEPs are the same (or at least very similar), the text describing the intended behaviour in this PEP is often derived directly from the PEP 634 text.
Steven D’Aprano, who made a compelling case that the key goals of this PEP could
be achieved by using existing comparison tokens to tell the ability to override
the compiler when our guesses as to “what most users will want most of the time”
are inevitably incorrect for at least some users some of the time, and retaining
some of PEP 634’s syntactic sugar (with a slightly different semantic definition)
to obtain the same level of brevity as PEP 634 in most situations. (Paul
Sokolosvsky also independently suggested using ==
instead of ?
as a
more easily understood prefix for equality constraints).
Thomas Wouters, whose publication of PEP 640 and public review of the structured pattern matching proposals persuaded the author of this PEP to continue advocating for a wildcard pattern syntax that a future PEP could plausibly turn into a hard keyword that always skips binding a reference in any location a simple name is expected, rather than continuing indefinitely as the match pattern specific soft keyword that is proposed here.
Joao Bueno and Jim Jewett for nudging the PEP author to take a closer look at the proposed syntax for subelement capturing within class patterns and mapping patterns (particularly the problems with “capturing to the right”). This review is what prompted the significant changes between v2 and v3 of the proposal.
References
- [1]
- Post explaining the syntactic novelties in PEP 622 https://mail.python.org/archives/list/python-dev@python.org/message/2VRPDW4EE243QT3QNNCO7XFZYZGIY6N3/>
- [2]
- Declined pull request proposing to list this as a Rejected Idea in PEP 622 https://github.com/python/peps/pull/1564
- [3]
- In-progress reference implementation for this PEP https://github.com/ncoghlan/cpython/tree/pep-642-constraint-patterns
- [4]
- PEP 634 reference implementation https://github.com/python/cpython/pull/22917
- [5]
- Steven D’Aprano’s cogent criticism of the first published iteration of this PEP https://mail.python.org/archives/list/python-dev@python.org/message/BTHFWG6MWLHALOD6CHTUFPHAR65YN6BP/
- [6]
- Thomas Wouter’s initial review of the structured pattern matching proposals https://mail.python.org/archives/list/python-dev@python.org/thread/4SBR3J5IQUYE752KR7C6432HNBSYKC5X/
- [7]
- Stack Overflow answer regarding the use cases for
_
as an identifier https://stackoverflow.com/questions/5893163/what-is-the-purpose-of-the-single-underscore-variable-in-python/5893946#5893946 - [8]
- Pre-publication draft of “Precise Semantics for Pattern Matching” https://github.com/markshannon/pattern-matching/blob/master/precise_semantics.rst
- [9]
- Kohn et al., Dynamic Pattern Matching with Python https://gvanrossum.github.io/docs/PyPatternMatching.pdf
Appendix A – Full Grammar
Here is the full modified grammar for match_stmt
, replacing Appendix A
in PEP 634.
Notation used beyond standard EBNF is as per PEP 534:
'KWD'
denotes a hard keyword"KWD"
denotes a soft keywordSEP.RULE+
is shorthand forRULE (SEP RULE)*
!RULE
is a negative lookahead assertion
match_stmt: "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
subject_expr:
| star_named_expression ',' [star_named_expressions]
| named_expression
case_block: "case" (guarded_pattern | open_pattern) ':' block
guarded_pattern: closed_pattern 'if' named_expression
open_pattern: # Pattern may use multiple tokens with no closing delimiter
| as_pattern
| or_pattern
as_pattern: [closed_pattern] pattern_as_clause
as_pattern_with_inferred_wildcard: pattern_as_clause
pattern_as_clause: 'as' pattern_capture_target
pattern_capture_target: !"__" NAME !('.' | '(' | '=')
or_pattern: '|'.simple_pattern+
simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
| closed_pattern
| value_constraint
value_constraint:
| eq_constraint
| id_constraint
eq_constraint: '==' closed_expr
id_constraint: 'is' closed_expr
closed_expr: # Require a single token or a closing delimiter in expression
| primary
| closed_factor
closed_factor: # "factor" is the main grammar node for these unary ops
| '+' primary
| '-' primary
| '~' primary
closed_pattern: # Require a single token or a closing delimiter in pattern
| wildcard_pattern
| group_pattern
| structural_constraint
wildcard_pattern: "__"
group_pattern: '(' open_pattern ')'
structural_constraint:
| sequence_constraint
| mapping_constraint
| attrs_constraint
| class_constraint
sequence_constraint: '[' [sequence_constraint_elements] ']'
sequence_constraint_elements: ','.sequence_constraint_element+ ','?
sequence_constraint_element:
| star_pattern
| simple_pattern
| as_pattern_with_inferred_wildcard
star_pattern: '*' (pattern_as_clause | wildcard_pattern)
mapping_constraint: '{' [mapping_constraint_elements] '}'
mapping_constraint_elements: ','.key_value_constraint+ ','?
key_value_constraint:
| closed_expr pattern_as_clause
| closed_expr ':' simple_pattern
| double_star_capture
double_star_capture: '**' pattern_as_clause
attrs_constraint:
| name_or_attr '{' [attrs_constraint_elements] '}'
name_or_attr: attr | NAME
attr: name_or_attr '.' NAME
attrs_constraint_elements: ','.attr_value_constraint+ ','?
attr_value_constraint:
| '.' NAME pattern_as_clause
| '.' NAME value_constraint
| '.' NAME ':' simple_pattern
| '.' NAME
class_constraint:
| name_or_attr '(' ')'
| name_or_attr '(' positional_patterns ','? ')'
| name_or_attr '(' class_constraint_attrs ')'
| name_or_attr '(' positional_patterns ',' class_constraint_attrs] ')'
positional_patterns: ','.positional_pattern+
positional_pattern:
| simple_pattern
| as_pattern_with_inferred_wildcard
class_constraint_attrs:
| '**' '{' [attrs_constraint_elements] '}'
Appendix B: Summary of Abstract Syntax Tree changes
The following new nodes are added to the AST by this PEP:
stmt = ...
| ...
| Match(expr subject, match_case* cases)
| ...
...
match_case = (pattern pattern, expr? guard, stmt* body)
pattern = MatchAlways
| MatchValue(matchop op, expr value)
| MatchSequence(pattern* patterns)
| MatchMapping(expr* keys, pattern* patterns)
| MatchAttrs(expr cls, identifier* attrs, pattern* patterns)
| MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns)
| MatchRestOfSequence(identifier? target)
-- A NULL entry in the MatchMapping key list handles capturing extra mapping keys
| MatchAs(pattern? pattern, identifier target)
| MatchOr(pattern* patterns)
attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset)
matchop = EqCheck | IdCheck
Appendix C: Summary of changes relative to PEP 634
The overall match
/case
statement syntax and the guard expression syntax
remain the same as they are in PEP 634.
Relative to PEP 634 this PEP makes the following key changes:
- a new
pattern
type is defined in the AST, rather than reusing theexpr
type for patterns - the new
MatchAs
andMatchOr
AST nodes are moved from theexpr
type to thepattern
type - the wildcard pattern changes from
_
(single underscore) to__
(double underscore), and gains a dedicatedMatchAlways
node in the AST - due to ambiguity of intent, value patterns and literal patterns are removed
- a new expression category is introduced: “closed expressions”
- closed expressions are either primary expressions, or a closed expression
preceded by one of the high precedence unary operators (
+
,-
,~
) - a new pattern type is introduced: “value constraint patterns”
- value constraints have a dedicated
MatchValue
AST node rather than allowing a combination ofConstant
(literals),UnaryOp
(negative numbers),BinOp
(complex numbers), andAttribute
(attribute lookups) - value constraint patterns are either equality constraints or identity constraints
- equality constraints use
==
as a prefix marker on an otherwise arbitrary closed expression:== EXPR
- identity constraints use
is
as a prefix marker on an otherwise arbitrary closed expression:is EXPR
- due to ambiguity of intent, capture patterns are removed. All capture operations
use the
as
keyword (even in sequence matching) and are represented in the AST as eitherMatchAs
orMatchRestOfSequence
nodes. - to reduce verbosity in AS patterns,
as NAME
is permitted, with the same meaning as__ as NAME
- sequence patterns change to require the use of square brackets, rather than offering the same syntactic flexibility as assignment targets (assignment statements allow iterable unpacking to be indicated by any use of a tuple separated target, with or without surrounding parentheses or square brackets)
- sequence patterns gain a dedicated
MatchSequence
AST node rather than reusingList
- mapping patterns change to allow arbitrary closed expressions as keys
- mapping patterns gain a dedicated
MatchMapping
AST node rather than reusingDict
- to reduce verbosity in mapping patterns,
KEY : __ as NAME
may be shortened toKEY as NAME
- class patterns no longer use individual keyword argument syntax for attribute matching. Instead they use double-star syntax, along with a variant on mapping pattern syntax with a dot prefix on the attribute names
- class patterns gain a dedicated
MatchClass
AST node rather than reusingCall
- to reduce verbosity, class attribute matching allows
:
to be omitted when the pattern to be matched starts with==
,is
, oras
- class patterns treat any class that sets
__match_args__
toNone
as accepting a single positional pattern that is matched against the entire object (avoiding the special casing required in PEP 634) - class patterns raise
TypeError
when used with an object that does not define__match_args__
- dedicated syntax for ducktyping is added, such that
case cls{...}:
is roughly equivalent tocase cls(**{...}):
, but skips the check for the existence of__match_args__
. This pattern also has a dedicated AST node,MatchAttrs
Note that postponing literal patterns also makes it possible to postpone the question of whether we need an “INUMBER” token in the tokeniser for imaginary literals. Without it, the parser can’t distinguish complex literals from other binary addition and subtraction operations on constants, so proposals like PEP 634 have to do work in later compilation steps to check for correct usage.
Appendix D: History of changes to this proposal
The first published iteration of this proposal mostly followed PEP 634, but
suggested using ?EXPR
for equality constraints and ?is EXPR
for
identity constraints rather than PEP 634’s value patterns and literal patterns.
The second published iteration mostly adopted a counter-proposal from Steven
D’Aprano that kept the PEP 634 style inferred constraints in many situations,
but also allowed the use of == EXPR
for explicit equality constraints, and
is EXPR
for explicit identity constraints.
The third published (and current) iteration dropped inferred patterns entirely,
in an attempt to resolve the concerns with the fact that the patterns
case {key: NAME}:
and case cls(attr=NAME):
would both bind NAME
despite it appearing to the right of another subexpression without using the
as
keyword. The revised proposal also eliminates the possibility of writing
case TARGET1 as TARGET2:
, which would bind to both of the given names. Of
those changes, the most concerning was case cls(attr=TARGET_NAME):
, since it
involved the use of =
with the binding target on the right, the exact
opposite of what happens in assignment statements, function calls, and
function signature declarations.
Copyright
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
Source: https://github.com/python-discord/peps/blob/main/pep-0642.rst
Last modified: 2022-03-09 16:04:44 GMT