PEP 634 – Structural Pattern Matching: Specification
- PEP
- 634
- Title
- Structural Pattern Matching: Specification
- Author
- Brandt Bucher <brandt at python.org>, Guido van Rossum <guido at python.org>
- BDFL-Delegate
- Discussions-To
- python-dev@python.org
- Status
- Accepted
- Type
- Standards Track
- Created
- 12-Sep-2020
- Python-Version
- 3.10
- Post-History
- 22-Oct-2020, 08-Feb-2021
- Replaces
- 622
- Resolution
- Python-Committers
Abstract
This PEP provides the technical specification for the match statement. It replaces PEP 622, which is hereby split in three parts:
This PEP is intentionally devoid of commentary; the motivation and all explanations of our design choices are in PEP 635. First-time readers are encouraged to start with PEP 636, which provides a gentler introduction to the concepts, syntax and semantics of patterns.
Syntax and Semantics
See Appendix A for the complete grammar.
Overview and Terminology
The pattern matching process takes as input a pattern (following
case
) and a subject value (following match
). Phrases to
describe the process include “the pattern is matched with (or against)
the subject value” and “we match the pattern against (or with) the
subject value”.
The primary outcome of pattern matching is success or failure. In case of success we may say “the pattern succeeds”, “the match succeeds”, or “the pattern matches the subject value”.
In many cases a pattern contains subpatterns, and success or failure is determined by the success or failure of matching those subpatterns against the value (e.g., for OR patterns) or against parts of the value (e.g., for sequence patterns). This process typically processes the subpatterns from left to right until the overall outcome is determined. E.g., an OR pattern succeeds at the first succeeding subpattern, while a sequence patterns fails at the first failing subpattern.
A secondary outcome of pattern matching may be one or more name bindings. We may say “the pattern binds a value to a name”. When subpatterns tried until the first success, only the bindings due to the successful subpattern are valid; when trying until the first failure, the bindings are merged. Several more rules, explained below, apply to these cases.
The Match Statement
Syntax:
match_stmt: "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
subject_expr:
| star_named_expression ',' star_named_expressions?
| named_expression
case_block: "case" patterns [guard] ':' block
guard: 'if' named_expression
The rules star_named_expression
, star_named_expressions
,
named_expression
and block
are part of the standard Python
grammar.
The rule patterns
is specified below.
For context, match_stmt
is a new alternative for
compound_statement
:
compound_statement:
| if_stmt
...
| match_stmt
The match
and case
keywords are soft keywords, i.e. they are
not reserved words in other grammatical contexts (including at the
start of a line if there is no colon where expected). This implies
that they are recognized as keywords when part of a match
statement or case block only, and are allowed to be used in all
other contexts as variable or argument names.
Match Semantics
The match statement first evaluates the subject expression. If a comma is present a tuple is constructed using the standard rules.
The resulting subject value is then used to select the first case
block whose patterns succeeds matching it and whose guard condition
(if present) is “truthy”. If no case blocks qualify the match
statement is complete; otherwise, the block of the selected case block
is executed. The usual rules for executing a block nested inside a
compound statement apply (e.g. an if
statement).
Name bindings made during a successful pattern match outlive the executed block and can be used after the match statement.
During failed pattern matches, some subpatterns may succeed. For
example, while matching the pattern (0, x, 1)
with the value [0,
1, 2]
, the subpattern x
may succeed if the list elements are
matched from left to right. The implementation may choose to either
make persistent bindings for those partial matches or not. User code
including a match statement should not rely on the bindings being
made for a failed match, but also shouldn’t assume that variables are
unchanged by a failed match. This part of the behavior is left
intentionally unspecified so different implementations can add
optimizations, and to prevent introducing semantic restrictions that
could limit the extensibility of this feature.
The precise pattern binding rules vary per pattern type and are specified below.
Guards
If a guard is present on a case block, once the pattern or patterns in the case block succeed, the expression in the guard is evaluated. If this raises an exception, the exception bubbles up. Otherwise, if the condition is “truthy” the case block is selected; if it is “falsy” the case block is not selected.
Since guards are expressions they are allowed to have side effects. Guard evaluation must proceed from the first to the last case block, one at a time, skipping case blocks whose pattern(s) don’t all succeed. (I.e., even if determining whether those patterns succeed may happen out of order, guard evaluation must happen in order.) Guard evaluation must stop once a case block is selected.
Irrefutable case blocks
A pattern is considered irrefutable if we can prove from its syntax alone that it will always succeed. In particular, capture patterns and wildcard patterns are irrefutable, and so are AS patterns whose left-hand side is irrefutable, OR patterns containing at least one irrefutable pattern, and parenthesized irrefutable patterns.
A case block is considered irrefutable if it has no guard and its pattern is irrefutable.
A match statement may have at most one irrefutable case block, and it must be last.
Patterns
The top-level syntax for patterns is as follows:
patterns: open_sequence_pattern | pattern
pattern: as_pattern | or_pattern
as_pattern: or_pattern 'as' capture_pattern
or_pattern: '|'.closed_pattern+
closed_pattern:
| literal_pattern
| capture_pattern
| wildcard_pattern
| value_pattern
| group_pattern
| sequence_pattern
| mapping_pattern
| class_pattern
AS Patterns
Syntax:
as_pattern: or_pattern 'as' capture_pattern
(Note: the name on the right may not be _
.)
An AS pattern matches the OR pattern on the left of the as
keyword against the subject. If this fails, the AS pattern fails.
Otherwise, the AS pattern binds the subject to the name on the right
of the as
keyword and succeeds.
OR Patterns
Syntax:
or_pattern: '|'.closed_pattern+
When two or more patterns are separated by vertical bars (|
),
this is called an OR pattern. (A single closed pattern is just that.)
Only the final subpattern may be irrefutable.
Each subpattern must bind the same set of names.
An OR pattern matches each of its subpatterns in turn to the subject, until one succeeds. The OR pattern is then deemed to succeed. If none of the subpatterns succeed the OR pattern fails.
Literal Patterns
Syntax:
literal_pattern:
| signed_number
| signed_number '+' NUMBER
| signed_number '-' NUMBER
| strings
| 'None'
| 'True'
| 'False'
signed_number: NUMBER | '-' NUMBER
The rule strings
and the token NUMBER
are defined in the
standard Python grammar.
Triple-quoted strings are supported. Raw strings and byte strings are supported. F-strings are not supported.
The forms signed_number '+' NUMBER
and signed_number '-'
NUMBER
are only permitted to express complex numbers; they require a
real number on the left and an imaginary number on the right.
A literal pattern succeeds if the subject value compares equal to the value expressed by the literal, using the following comparisons rules:
- Numbers and strings are compared using the
==
operator. - The singleton literals
None
,True
andFalse
are compared using theis
operator.
Capture Patterns
Syntax:
capture_pattern: !"_" NAME
The single underscore (_
) is not a capture pattern (this is what
!"_"
expresses). It is treated as a wildcard pattern.
A capture pattern always succeeds. It binds the subject value to the
name using the scoping rules for name binding established for the
walrus operator in PEP 572. (Summary: the name becomes a local
variable in the closest containing function scope unless there’s an
applicable nonlocal
or global
statement.)
In a given pattern, a given name may be bound only once. This
disallows for example case x, x: ...
but allows case [x] | x:
...
.
Wildcard Pattern
Syntax:
wildcard_pattern: "_"
A wildcard pattern always succeeds. It binds no name.
Value Patterns
Syntax:
value_pattern: attr
attr: name_or_attr '.' NAME
name_or_attr: attr | NAME
The dotted name in the pattern is looked up using the standard Python name resolution rules. However, when the same value pattern occurs multiple times in the same match statement, the interpreter may cache the first value found and reuse it, rather than repeat the same lookup. (To clarify, this cache is strictly tied to a given execution of a given match statement.)
The pattern succeeds if the value found thus compares equal to the
subject value (using the ==
operator).
Group Patterns
Syntax:
group_pattern: '(' pattern ')'
(For the syntax of pattern
, see Patterns above. Note that it
contains no comma – a parenthesized series of items with at least one
comma is a sequence pattern, as is ()
.)
A parenthesized pattern has no additional syntax. It allows users to add parentheses around patterns to emphasize the intended grouping.
Sequence Patterns
Syntax:
sequence_pattern:
| '[' [maybe_sequence_pattern] ']'
| '(' [open_sequence_pattern] ')'
open_sequence_pattern: maybe_star_pattern ',' [maybe_sequence_pattern]
maybe_sequence_pattern: ','.maybe_star_pattern+ ','?
maybe_star_pattern: star_pattern | pattern
star_pattern: '*' (capture_pattern | wildcard_pattern)
(Note that a single parenthesized pattern without a trailing comma is
a group pattern, not a sequence pattern. However a single pattern
enclosed in [...]
is still a sequence pattern.)
There is no semantic difference between a sequence pattern using
[...]
, a sequence pattern using (...)
, and an open sequence
pattern.
A sequence pattern may contain at most one star subpattern. The star subpattern may occur in any position. If no star subpattern is present, the sequence pattern is a fixed-length sequence pattern; otherwise it is a variable-length sequence pattern.
For a sequence pattern to succeed the subject must be a sequence, where being a sequence is defined as its class being one of the following:
- a class that inherits from
collections.abc.Sequence
- a Python class that has been registered as a
collections.abc.Sequence
- a builtin class that has its
Py_TPFLAGS_SEQUENCE
bit set - a class that inherits from any of the above (including classes defined before a
parent’s
Sequence
registration)
The following standard library classes will have their Py_TPFLAGS_SEQUENCE
bit set:
array.array
collections.deque
list
memoryview
range
tuple
Note
Although str
, bytes
, and bytearray
are usually
considered sequences, they are not included in the above list and do
not match sequence patterns.
A fixed-length sequence pattern fails if the length of the subject sequence is not equal to the number of subpatterns.
A variable-length sequence pattern fails if the length of the subject sequence is less than the number of non-star subpatterns.
The length of the subject sequence is obtained using the builtin
len()
function (i.e., via the __len__
protocol). However, the
interpreter may cache this value in a similar manner as described for
value patterns.
A fixed-length sequence pattern matches the subpatterns to corresponding items of the subject sequence, from left to right. Matching stops (with a failure) as soon as a subpattern fails. If all subpatterns succeed in matching their corresponding item, the sequence pattern succeeds.
A variable-length sequence pattern first matches the leading non-star subpatterns to the corresponding items of the subject sequence, as for a fixed-length sequence. If this succeeds, the star subpattern matches a list formed of the remaining subject items, with items removed from the end corresponding to the non-star subpatterns following the star subpattern. The remaining non-star subpatterns are then matched to the corresponding subject items, as for a fixed-length sequence.
Mapping Patterns
Syntax:
mapping_pattern: '{' [items_pattern] '}'
items_pattern: ','.key_value_pattern+ ','?
key_value_pattern:
| (literal_pattern | value_pattern) ':' pattern
| double_star_pattern
double_star_pattern: '**' capture_pattern
(Note that **_
is disallowed by this syntax.)
A mapping pattern may contain at most one double star pattern, and it must be last.
A mapping pattern may not contain duplicate key values.
(If all key patterns are literal patterns this is considered a
syntax error; otherwise this is a runtime error and will
raise ValueError
.)
For a mapping pattern to succeed the subject must be a mapping, where being a mapping is defined as its class being one of the following:
- a class that inherits from
collections.abc.Mapping
- a Python class that has been registered as a
collections.abc.Mapping
- a builtin class that has its
Py_TPFLAGS_MAPPING
bit set - a class that inherits from any of the above (including classes defined before a
parent’s
Mapping
registration)
The standard library classes dict
and mappingproxy
will have their Py_TPFLAGS_MAPPING
bit set.
A mapping pattern succeeds if every key given in the mapping pattern
is present in the subject mapping, and the pattern for
each key matches the corresponding item of the subject mapping. Keys
are always compared with the ==
operator. If a '**'
NAME
form is present, that name is bound to a dict
containing
remaining key-value pairs from the subject mapping.
If duplicate keys are detected in the mapping pattern, the pattern is
considered invalid, and a ValueError
is raised.
Key-value pairs are matched using the two-argument form of the
subject’s get()
method. As a consequence, matched key-value pairs
must already be present in the mapping, and not created on-the-fly by
__missing__
or __getitem__
. For example,
collections.defaultdict
instances will only be matched by patterns
with keys that were already present when the match statement was
entered.
Class Patterns
Syntax:
class_pattern:
| name_or_attr '(' [pattern_arguments ','?] ')'
pattern_arguments:
| positional_patterns [',' keyword_patterns]
| keyword_patterns
positional_patterns: ','.pattern+
keyword_patterns: ','.keyword_pattern+
keyword_pattern: NAME '=' pattern
A class pattern may not repeat the same keyword multiple times.
If name_or_attr
is not an instance of the builtin type
,
TypeError
is raised.
A class pattern fails if the subject is not an instance of name_or_attr
.
This is tested using isinstance()
.
If no arguments are present, the pattern succeeds if the isinstance()
check succeeds. Otherwise:
- If only keyword patterns are present, they are processed as follows,
one by one:
- The keyword is looked up as an attribute on the subject.
- If this raises an exception other than
AttributeError
, the exception bubbles up. - If this raises
AttributeError
the class pattern fails. - Otherwise, the subpattern associated with the keyword is matched against the attribute value. If this fails, the class pattern fails. If it succeeds, the match proceeds to the next keyword.
- If this raises an exception other than
- If all keyword patterns succeed, the class pattern as a whole succeeds.
- The keyword is looked up as an attribute on the subject.
- If any positional patterns are present, they are converted to keyword patterns (see below) and treated as additional keyword patterns, preceding the syntactic keyword patterns (if any).
Positional patterns are converted to keyword patterns using the
__match_args__
attribute on the class designated by name_or_attr
,
as follows:
- For a number of built-in types (specified below), a single positional subpattern is accepted which will match the entire subject. (Keyword patterns work as for other types here.)
- The equivalent of
getattr(cls, "__match_args__", ()))
is called. - If this raises an exception the exception bubbles up.
- If the returned value is not a tuple, the conversion fails
and
TypeError
is raised. - If there are more positional patterns than the length of
__match_args__
(as obtained usinglen()
),TypeError
is raised. - Otherwise, positional pattern
i
is converted to a keyword pattern using__match_args__[i]
as the keyword, provided it the latter is a string; if it is not,TypeError
is raised. - For duplicate keywords,
TypeError
is raised.
Once the positional patterns have been converted to keyword patterns, the match proceeds as if there were only keyword patterns.
As mentioned above, for the following built-in types the handling of
positional subpatterns is different:
bool
, bytearray
, bytes
, dict
, float
,
frozenset
, int
, list
, set
, str
, and tuple
.
This behavior is roughly equivalent to the following:
class C:
__match_args__ = ("__match_self_prop__",)
@property
def __match_self_prop__(self):
return self
Side Effects and Undefined Behavior
The only side-effect produced explicitly by the matching process is
the binding of names. However, the process relies on attribute
access, instance checks, len()
, equality and item access on the
subject and some of its components. It also evaluates value
patterns and the class name of class patterns. While none of those
typically create any side-effects, in theory they could. This
proposal intentionally leaves out any specification of what methods
are called or how many times. This behavior is therefore undefined
and user code should not rely on it.
Another undefined behavior is the binding of variables by capture
patterns that are followed (in the same case block) by another pattern
that fails. These may happen earlier or later depending on the
implementation strategy, the only constraint being that capture
variables must be set before guards that use them explicitly are
evaluated. If a guard consists of an and
clause, evaluation of
the operands may even be interspersed with pattern matching, as long
as left-to-right evaluation order is maintained.
The Standard Library
To facilitate the use of pattern matching, several changes will be made to the standard library:
- Namedtuples and dataclasses will have auto-generated
__match_args__
. - For dataclasses the order of attributes in the generated
__match_args__
will be the same as the order of corresponding arguments in the generated__init__()
method. This includes the situations where attributes are inherited from a superclass. Fields withinit=False
are excluded from__match_args__
.
In addition, a systematic effort will be put into going through
existing standard library classes and adding __match_args__
where
it looks beneficial.
Appendix A – Full Grammar
Here is the full grammar for match_stmt
. This is an additional
alternative for compound_stmt
. Remember that match
and
case
are soft keywords, i.e. they are not reserved words in other
grammatical contexts (including at the start of a line if there is no
colon where expected). By convention, hard keywords use single quotes
while soft keywords use double quotes.
Other notation used beyond standard EBNF:
SEP.RULE+
is shorthand forRULE (SEP RULE)*
!RULE
is a negative lookahead assertion
match_stmt: "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
subject_expr:
| star_named_expression ',' [star_named_expressions]
| named_expression
case_block: "case" patterns [guard] ':' block
guard: 'if' named_expression
patterns: open_sequence_pattern | pattern
pattern: as_pattern | or_pattern
as_pattern: or_pattern 'as' capture_pattern
or_pattern: '|'.closed_pattern+
closed_pattern:
| literal_pattern
| capture_pattern
| wildcard_pattern
| value_pattern
| group_pattern
| sequence_pattern
| mapping_pattern
| class_pattern
literal_pattern:
| signed_number !('+' | '-')
| signed_number '+' NUMBER
| signed_number '-' NUMBER
| strings
| 'None'
| 'True'
| 'False'
signed_number: NUMBER | '-' NUMBER
capture_pattern: !"_" NAME !('.' | '(' | '=')
wildcard_pattern: "_"
value_pattern: attr !('.' | '(' | '=')
attr: name_or_attr '.' NAME
name_or_attr: attr | NAME
group_pattern: '(' pattern ')'
sequence_pattern:
| '[' [maybe_sequence_pattern] ']'
| '(' [open_sequence_pattern] ')'
open_sequence_pattern: maybe_star_pattern ',' [maybe_sequence_pattern]
maybe_sequence_pattern: ','.maybe_star_pattern+ ','?
maybe_star_pattern: star_pattern | pattern
star_pattern: '*' (capture_pattern | wildcard_pattern)
mapping_pattern: '{' [items_pattern] '}'
items_pattern: ','.key_value_pattern+ ','?
key_value_pattern:
| (literal_pattern | value_pattern) ':' pattern
| double_star_pattern
double_star_pattern: '**' capture_pattern
class_pattern:
| name_or_attr '(' [pattern_arguments ','?] ')'
pattern_arguments:
| positional_patterns [',' keyword_patterns]
| keyword_patterns
positional_patterns: ','.pattern+
keyword_patterns: ','.keyword_pattern+
keyword_pattern: NAME '=' pattern
Copyright
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
Source: https://github.com/python-discord/peps/blob/main/pep-0634.rst
Last modified: 2022-02-27 22:46:36 GMT