PEP 479 – Change StopIteration handling inside generators
- PEP
- 479
- Title
- Change StopIteration handling inside generators
- Author
- Chris Angelico <rosuav at gmail.com>, Guido van Rossum <guido at python.org>
- Status
- Final
- Type
- Standards Track
- Created
- 15-Nov-2014
- Python-Version
- 3.5
- Post-History
- 15-Nov-2014, 19-Nov-2014, 05-Dec-2014
Abstract
This PEP proposes a change to generators: when StopIteration
is
raised inside a generator, it is replaced with RuntimeError
.
(More precisely, this happens when the exception is about to bubble
out of the generator’s stack frame.) Because the change is backwards
incompatible, the feature is initially introduced using a
__future__
statement.
Acceptance
This PEP was accepted by the BDFL on November 22. Because of the exceptionally short period from first draft to acceptance, the main objections brought up after acceptance were carefully considered and have been reflected in the “Alternate proposals” section below. However, none of the discussion changed the BDFL’s mind and the PEP’s acceptance is now final. (Suggestions for clarifying edits are still welcome – unlike IETF RFCs, the text of a PEP is not cast in stone after its acceptance, although the core design/plan/specification should not change after acceptance.)
Rationale
The interaction of generators and StopIteration
is currently
somewhat surprising, and can conceal obscure bugs. An unexpected
exception should not result in subtly altered behaviour, but should
cause a noisy and easily-debugged traceback. Currently,
StopIteration
raised accidentally inside a generator function will
be interpreted as the end of the iteration by the loop construct
driving the generator.
The main goal of the proposal is to ease debugging in the situation
where an unguarded next()
call (perhaps several stack frames deep)
raises StopIteration
and causes the iteration controlled by the
generator to terminate silently. (Whereas, when some other exception
is raised, a traceback is printed pinpointing the cause of the
problem.)
This is particularly pernicious in combination with the yield from
construct of PEP 380, as it breaks the abstraction that a
subgenerator may be factored out of a generator. That PEP notes this
limitation, but notes that “use cases for these [are] rare to non-
existent”. Unfortunately while intentional use is rare, it is easy to
stumble on these cases by accident:
import contextlib
@contextlib.contextmanager
def transaction():
print('begin')
try:
yield from do_it()
except:
print('rollback')
raise
else:
print('commit')
def do_it():
print('Refactored initial setup')
yield # Body of with-statement is executed here
print('Refactored finalization of successful transaction')
def gene():
for i in range(2):
with transaction():
yield i
# return
raise StopIteration # This is wrong
print('Should not be reached')
for i in gene():
print('main: i =', i)
Here factoring out do_it
into a subgenerator has introduced a
subtle bug: if the wrapped block raises StopIteration
, under the
current behavior this exception will be swallowed by the context
manager; and, worse, the finalization is silently skipped! Similarly
problematic behavior occurs when an asyncio
coroutine raises
StopIteration
, causing it to terminate silently, or when next
is used to take the first result from an iterator that unexpectedly
turns out to be empty, for example:
# using the same context manager as above
import pathlib
with transaction():
print('commit file {}'.format(
# I can never remember what the README extension is
next(pathlib.Path('/some/dir').glob('README*'))))
In both cases, the refactoring abstraction of yield from
breaks
in the presence of bugs in client code.
Additionally, the proposal reduces the difference between list comprehensions and generator expressions, preventing surprises such as the one that started this discussion [2]. Henceforth, the following statements will produce the same result if either produces a result at all:
a = list(F(x) for x in xs if P(x))
a = [F(x) for x in xs if P(x)]
With the current state of affairs, it is possible to write a function
F(x)
or a predicate P(x)
that causes the first form to produce
a (truncated) result, while the second form raises an exception
(namely, StopIteration
). With the proposed change, both forms
will raise an exception at this point (albeit RuntimeError
in the
first case and StopIteration
in the second).
Finally, the proposal also clears up the confusion about how to
terminate a generator: the proper way is return
, not
raise StopIteration
.
As an added bonus, the above changes bring generator functions much
more in line with regular functions. If you wish to take a piece of
code presented as a generator and turn it into something else, you
can usually do this fairly simply, by replacing every yield
with
a call to print()
or list.append()
; however, if there are any
bare next()
calls in the code, you have to be aware of them. If
the code was originally written without relying on StopIteration
terminating the function, the transformation would be that much
easier.
Background information
When a generator frame is (re)started as a result of a __next__()
(or send()
or throw()
) call, one of three outcomes can occur:
- A yield point is reached, and the yielded value is returned.
- The frame is returned from;
StopIteration
is raised. - An exception is raised, which bubbles out.
In the latter two cases the frame is abandoned (and the generator
object’s gi_frame
attribute is set to None).
Proposal
If a StopIteration
is about to bubble out of a generator frame, it
is replaced with RuntimeError
, which causes the next()
call
(which invoked the generator) to fail, passing that exception out.
From then on it’s just like any old exception. [3]
This affects the third outcome listed above, without altering any
other effects. Furthermore, it only affects this outcome when the
exception raised is StopIteration
(or a subclass thereof).
Note that the proposed replacement happens at the point where the
exception is about to bubble out of the frame, i.e. after any
except
or finally
blocks that could affect it have been
exited. The StopIteration
raised by returning from the frame is
not affected (the point being that StopIteration
means that the
generator terminated “normally”, i.e. it did not raise an exception).
A subtle issue is what will happen if the caller, having caught the
RuntimeError
, calls the generator object’s __next__()
method
again. The answer is that from this point on it will raise
StopIteration
– the behavior is the same as when any other
exception was raised by the generator.
Another logical consequence of the proposal: if someone uses
g.throw(StopIteration)
to throw a StopIteration
exception into
a generator, if the generator doesn’t catch it (which it could do
using a try/except
around the yield
), it will be transformed
into RuntimeError
.
During the transition phase, the new feature must be enabled per-module using:
from __future__ import generator_stop
Any generator function constructed under the influence of this
directive will have the REPLACE_STOPITERATION
flag set on its code
object, and generators with the flag set will behave according to this
proposal. Once the feature becomes standard, the flag may be dropped;
code should not inspect generators for it.
A proof-of-concept patch has been created to facilitate testing. [4]
Consequences for existing code
This change will affect existing code that depends on
StopIteration
bubbling up. The pure Python reference
implementation of groupby
[5] currently has comments “Exit on
StopIteration
” where it is expected that the exception will
propagate and then be handled. This will be unusual, but not unknown,
and such constructs will fail. Other examples abound, e.g. [6], [7].
(Nick Coghlan comments: “””If you wanted to factor out a helper function that terminated the generator you’d have to do “return yield from helper()” rather than just “helper()”.”””)
There are also examples of generator expressions floating around that
rely on a StopIteration
raised by the expression, the target or the
predicate (rather than by the __next__()
call implied in the for
loop proper).
Writing backwards and forwards compatible code
With the exception of hacks that raise StopIteration
to exit a
generator expression, it is easy to write code that works equally well
under older Python versions as under the new semantics.
This is done by enclosing those places in the generator body where a
StopIteration
is expected (e.g. bare next()
calls or in some
cases helper functions that are expected to raise StopIteration
)
in a try/except
construct that returns when StopIteration
is
raised. The try/except
construct should appear directly in the
generator function; doing this in a helper function that is not itself
a generator does not work. If raise StopIteration
occurs directly
in a generator, simply replace it with return
.
Examples of breakage
Generators which explicitly raise StopIteration
can generally be
changed to simply return instead. This will be compatible with all
existing Python versions, and will not be affected by __future__
.
Here are some illustrations from the standard library.
Lib/ipaddress.py:
if other == self:
raise StopIteration
Becomes:
if other == self:
return
In some cases, this can be combined with yield from
to simplify
the code, such as Lib/difflib.py:
if context is None:
while True:
yield next(line_pair_iterator)
Becomes:
if context is None:
yield from line_pair_iterator
return
(The return
is necessary for a strictly-equivalent translation,
though in this particular file, there is no further code, and the
return
can be omitted.) For compatibility with pre-3.3 versions
of Python, this could be written with an explicit for
loop:
if context is None:
for line in line_pair_iterator:
yield line
return
More complicated iteration patterns will need explicit try/except
constructs. For example, a hypothetical parser like this:
def parser(f):
while True:
data = next(f)
while True:
line = next(f)
if line == "- end -": break
data += line
yield data
would need to be rewritten as:
def parser(f):
while True:
try:
data = next(f)
while True:
line = next(f)
if line == "- end -": break
data += line
yield data
except StopIteration:
return
or possibly:
def parser(f):
for data in f:
while True:
line = next(f)
if line == "- end -": break
data += line
yield data
The latter form obscures the iteration by purporting to iterate over
the file with a for
loop, but then also fetches more data from
the same iterator during the loop body. It does, however, clearly
differentiate between a “normal” termination (StopIteration
instead of the initial line) and an “abnormal” termination (failing
to find the end marker in the inner loop, which will now raise
RuntimeError
).
This effect of StopIteration
has been used to cut a generator
expression short, creating a form of takewhile
:
def stop():
raise StopIteration
print(list(x for x in range(10) if x < 5 or stop()))
# prints [0, 1, 2, 3, 4]
Under the current proposal, this form of non-local flow control is not supported, and would have to be rewritten in statement form:
def gen():
for x in range(10):
if x >= 5: return
yield x
print(list(gen()))
# prints [0, 1, 2, 3, 4]
While this is a small loss of functionality, it is functionality that
often comes at the cost of readability, and just as lambda
has
restrictions compared to def
, so does a generator expression have
restrictions compared to a generator function. In many cases, the
transformation to full generator function will be trivially easy, and
may improve structural clarity.
Explanation of generators, iterators, and StopIteration
The proposal does not change the relationship between generators and
iterators: a generator object is still an iterator, and not all
iterators are generators. Generators have additional methods that
iterators don’t have, like send
and throw
. All this is
unchanged. Nothing changes for generator users – only authors of
generator functions may have to learn something new. (This includes
authors of generator expressions that depend on early termination of
the iteration by a StopIteration
raised in a condition.)
An iterator is an object with a __next__
method. Like many other
special methods, it may either return a value, or raise a specific
exception - in this case, StopIteration
- to signal that it has
no value to return. In this, it is similar to __getattr__
(can
raise AttributeError
), __getitem__
(can raise KeyError
),
and so on. A helper function for an iterator can be written to
follow the same protocol; for example:
def helper(x, y):
if x > y: return 1 / (x - y)
raise StopIteration
def __next__(self):
if self.a: return helper(self.b, self.c)
return helper(self.d, self.e)
Both forms of signalling are carried through: a returned value is returned, an exception bubbles up. The helper is written to match the protocol of the calling function.
A generator function is one which contains a yield
expression.
Each time it is (re)started, it may either yield a value, or return
(including “falling off the end”). A helper function for a generator
can also be written, but it must also follow generator protocol:
def helper(x, y):
if x > y: yield 1 / (x - y)
def gen(self):
if self.a: return (yield from helper(self.b, self.c))
return (yield from helper(self.d, self.e))
In both cases, any unexpected exception will bubble up. Due to the
nature of generators and iterators, an unexpected StopIteration
inside a generator will be converted into RuntimeError
, but
beyond that, all exceptions will propagate normally.
Transition plan
- Python 3.5: Enable new semantics under
__future__
import; silent deprecation warning ifStopIteration
bubbles out of a generator not under__future__
import. - Python 3.6: Non-silent deprecation warning.
- Python 3.7: Enable new semantics everywhere.
Alternate proposals
Raising something other than RuntimeError
Rather than the generic RuntimeError
, it might make sense to raise
a new exception type UnexpectedStopIteration
. This has the
downside of implicitly encouraging that it be caught; the correct
action is to catch the original StopIteration
, not the chained
exception.
Supplying a specific exception to raise on return
Nick Coghlan suggested a means of providing a specific
StopIteration
instance to the generator; if any other instance of
StopIteration
is raised, it is an error, but if that particular
one is raised, the generator has properly completed. This subproposal
has been withdrawn in favour of better options, but is retained for
reference.
Making return-triggered StopIterations obvious
For certain situations, a simpler and fully backward-compatible
solution may be sufficient: when a generator returns, instead of
raising StopIteration
, it raises a specific subclass of
StopIteration
(GeneratorReturn
) which can then be detected.
If it is not that subclass, it is an escaping exception rather than a
return statement.
The inspiration for this alternative proposal was Nick’s observation
[8] that if an asyncio
coroutine [9] accidentally raises
StopIteration
, it currently terminates silently, which may present
a hard-to-debug mystery to the developer. The main proposal turns
such accidents into clearly distinguishable RuntimeError
exceptions,
but if that is rejected, this alternate proposal would enable
asyncio
to distinguish between a return
statement and an
accidentally-raised StopIteration
exception.
Of the three outcomes listed above, two change:
- If a yield point is reached, the value, obviously, would still be returned.
- If the frame is returned from,
GeneratorReturn
(rather thanStopIteration
) is raised. - If an instance of
GeneratorReturn
would be raised, instead an instance ofStopIteration
would be raised. Any other exception bubbles up normally.
In the third case, the StopIteration
would have the value
of
the original GeneratorReturn
, and would reference the original
exception in its __cause__
. If uncaught, this would clearly show
the chaining of exceptions.
This alternative does not affect the discrepancy between generator
expressions and list comprehensions, but allows generator-aware code
(such as the contextlib
and asyncio
modules) to reliably
differentiate between the second and third outcomes listed above.
However, once code exists that depends on this distinction between
GeneratorReturn
and StopIteration
, a generator that invokes
another generator and relies on the latter’s StopIteration
to
bubble out would still be potentially wrong, depending on the use made
of the distinction between the two exception types.
Converting the exception inside next()
Mark Shannon suggested [10] that the problem could be solved in
next()
rather than at the boundary of generator functions. By
having next()
catch StopIteration
and raise instead
ValueError
, all unexpected StopIteration
bubbling would be
prevented; however, the backward-incompatibility concerns are far
more serious than for the current proposal, as every next()
call
now needs to be rewritten to guard against ValueError
instead of
StopIteration
- not to mention that there is no way to write one
block of code which reliably works on multiple versions of Python.
(Using a dedicated exception type, perhaps subclassing ValueError
,
would help this; however, all code would still need to be rewritten.)
Note that calling next(it, default)
catches StopIteration
and
substitutes the given default value; this feature is often useful to
avoid a try/except
block.
Sub-proposal: decorator to explicitly request current behaviour
Nick Coghlan suggested [11] that the situations where the current behaviour is desired could be supported by means of a decorator:
from itertools import allow_implicit_stop
@allow_implicit_stop
def my_generator():
...
yield next(it)
...
Which would be semantically equivalent to:
def my_generator():
try:
...
yield next(it)
...
except StopIteration
return
but be faster, as it could be implemented by simply permitting the
StopIteration
to bubble up directly.
Single-source Python 2/3 code would also benefit in a 3.7+ world, since libraries like six and python-future could just define their own version of “allow_implicit_stop” that referred to the new builtin in 3.5+, and was implemented as an identity function in other versions.
However, due to the implementation complexities required, the ongoing compatibility issues created, the subtlety of the decorator’s effect, and the fact that it would encourage the “quick-fix” solution of just slapping the decorator onto all generators instead of properly fixing the code in question, this sub-proposal has been rejected. [12]
Criticism
Unofficial and apocryphal statistics suggest that this is seldom, if ever, a problem. [13] Code does exist which relies on the current behaviour (e.g. [3], [6], [7]), and there is the concern that this would be unnecessary code churn to achieve little or no gain.
Steven D’Aprano started an informal survey on comp.lang.python [14]; at the time of writing only two responses have been received: one was in favor of changing list comprehensions to match generator expressions (!), the other was in favor of this PEP’s main proposal.
The existing model has been compared to the perfectly-acceptable
issues inherent to every other case where an exception has special
meaning. For instance, an unexpected KeyError
inside a
__getitem__
method will be interpreted as failure, rather than
permitted to bubble up. However, there is a difference. Special
methods use return
to indicate normality, and raise
to signal
abnormality; generators yield
to indicate data, and return
to
signal the abnormal state. This makes explicitly raising
StopIteration
entirely redundant, and potentially surprising. If
other special methods had dedicated keywords to distinguish between
their return paths, they too could turn unexpected exceptions into
RuntimeError
; the fact that they cannot should not preclude
generators from doing so.
Why not fix all __next__() methods?
When implementing a regular __next__()
method, the only way to
indicate the end of the iteration is to raise StopIteration
. So
catching StopIteration
here and converting it to RuntimeError
would defeat the purpose. This is a reminder of the special status of
generator functions: in a generator function, raising
StopIteration
is redundant since the iteration can be terminated
by a simple return
.
References
- [2]
- Initial mailing list comment (https://mail.python.org/pipermail/python-ideas/2014-November/029906.html)
- [3] (1, 2)
- Proposal by GvR (https://mail.python.org/pipermail/python-ideas/2014-November/029953.html)
- [4]
- Tracker issue with Proof-of-Concept patch (http://bugs.python.org/issue22906)
- [5]
- Pure Python implementation of groupby (https://docs.python.org/3/library/itertools.html#itertools.groupby)
- [6] (1, 2)
- Split a sequence or generator using a predicate (http://code.activestate.com/recipes/578416-split-a-sequence-or-generator-using-a-predicate/)
- [7] (1, 2)
- wrap unbounded generator to restrict its output (http://code.activestate.com/recipes/66427-wrap-unbounded-generator-to-restrict-its-output/)
- [8]
- Post from Nick Coghlan mentioning asyncio (https://mail.python.org/pipermail/python-ideas/2014-November/029961.html)
- [9]
- Coroutines in asyncio (https://docs.python.org/3/library/asyncio-task.html#coroutines)
- [10]
- Post from Mark Shannon with alternate proposal (https://mail.python.org/pipermail/python-dev/2014-November/137129.html)
- [11]
- Idea from Nick Coghlan (https://mail.python.org/pipermail/python-dev/2014-November/137201.html)
- [12]
- Rejection of above idea by GvR (https://mail.python.org/pipermail/python-dev/2014-November/137243.html)
- [13]
- Response by Steven D’Aprano (https://mail.python.org/pipermail/python-ideas/2014-November/029994.html)
- [14]
- Thread on comp.lang.python started by Steven D’Aprano (https://mail.python.org/pipermail/python-list/2014-November/680757.html)
Copyright
This document has been placed in the public domain.
Source: https://github.com/python-discord/peps/blob/main/pep-0479.txt
Last modified: 2022-03-09 16:04:44 GMT