PEP 469 – Migration of dict iteration code to Python 3
- PEP
- 469
- Title
- Migration of dict iteration code to Python 3
- Author
- Nick Coghlan <ncoghlan at gmail.com>
- Status
- Withdrawn
- Type
- Standards Track
- Created
- 18-Apr-2014
- Python-Version
- 3.5
- Post-History
- 18-Apr-2014, 21-Apr-2014
Abstract
For Python 3, PEP 3106 changed the design of the dict
builtin and the
mapping API in general to replace the separate list based and iterator based
APIs in Python 2 with a merged, memory efficient set and multiset view
based API. This new style of dict iteration was also added to the Python 2.7
dict
type as a new set of iteration methods.
This means that there are now 3 different kinds of dict iteration that may need to be migrated to Python 3 when an application makes the transition:
- Lists as mutable snapshots:
d.items()
->list(d.items())
- Iterator objects:
d.iteritems()
->iter(d.items())
- Set based dynamic views:
d.viewitems()
->d.items()
There is currently no widely agreed best practice on how to reliably convert all Python 2 dict iteration code to the common subset of Python 2 and 3, especially when test coverage of the ported code is limited. This PEP reviews the various ways the Python 2 iteration APIs may be accessed, and looks at the available options for migrating that code to Python 3 by way of the common subset of Python 2.6+ and Python 3.0+.
The PEP also considers the question of whether or not there are any additions that may be worth making to Python 3.5 that may ease the transition process for application code that doesn’t need to worry about supporting earlier versions when eventually making the leap to Python 3.
PEP Withdrawal
In writing the second draft of this PEP, I came to the conclusion that the readability of hybrid Python 2/3 mapping code can actually be best enhanced by better helper functions rather than by making changes to Python 3.5+. The main value I now see in this PEP is as a clear record of the recommended approaches to migrating mapping iteration code from Python 2 to Python 3, as well as suggesting ways to keep things readable and maintainable when writing hybrid code that supports both versions.
Notably, I recommend that hybrid code avoid calling mapping iteration methods directly, and instead rely on builtin functions where possible, and some additional helper functions for cases that would be a simple combination of a builtin and a mapping method in pure Python 3 code, but need to be handled slightly differently to get the exact same semantics in Python 2.
Static code checkers like pylint could potentially be extended with an optional warning regarding direct use of the mapping iteration methods in a hybrid code base.
Mapping iteration models
Python 2.7 provides three different sets of methods to extract the keys,
values and items from a dict
instance, accounting for 9 out of the
18 public methods of the dict
type.
In Python 3, this has been rationalised to just 3 out of 11 public methods
(as the has_key
method has also been removed).
Lists as mutable snapshots
This is the oldest of the three styles of dict iteration, and hence the
one implemented by the d.keys()
, d.values()
and d.items()
methods in Python 2.
These methods all return lists that are snapshots of the state of the mapping at the time the method was called. This has a few consequences:
- the original object can be mutated freely without affecting iteration over the snapshot
- the snapshot can be modified independently of the original object
- the snapshot consumes memory proportional to the size of the original mapping
The semantic equivalent of these operations in Python 3 are
list(d.keys())
, list(d.values())
and list(d.iteritems())
.
Iterator objects
In Python 2.2, dict
objects gained support for the then-new iterator
protocol, allowing direct iteration over the keys stored in the dictionary,
thus avoiding the need to build a list just to iterate over the dictionary
contents one entry at a time. iter(d)
provides direct access to the
iterator object for the keys.
Python 2 also provides a d.iterkeys()
method that is essentially
synonymous with iter(d)
, along with d.itervalues()
and
d.iteritems()
methods.
These iterators provide live views of the underlying object, and hence may fail if the set of keys in the underlying object is changed during iteration:
>>> d = dict(a=1)
>>> for k in d:
... del d[k]
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration
As iterators, iteration over these objects is also a one-time operation: once the iterator is exhausted, you have to go back to the original mapping in order to iterate again.
In Python 3, direct iteration over mappings works the same way as it does
in Python 2. There are no method based equivalents - the semantic equivalents
of d.itervalues()
and d.iteritems()
in Python 3 are
iter(d.values())
and iter(d.items())
.
The six
and future.utils
compatibility modules also both provide
iterkeys()
, itervalues()
and iteritems()
helper functions that
provide efficient iterator semantics in both Python 2 and 3.
Set based dynamic views
The model that is provided in Python 3 as a method based API is that of set
based dynamic views (technically multisets in the case of the values()
view).
In Python 3, the objects returned by d.keys()
, d.values()
and
d. items()
provide a live view of the current state of
the underlying object, rather than taking a full snapshot of the current
state as they did in Python 2. This change is safe in many circumstances,
but does mean that, as with the direct iteration API, it is necessary to
avoid adding or removing keys during iteration, in order to avoid
encountering the following error:
>>> d = dict(a=1)
>>> for k, v in d.items():
... del d[k]
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration
Unlike the iteration API, these objects are iterables, rather than iterators: you can iterate over them multiple times, and each time they will iterate over the entire underlying mapping.
These semantics are also available in Python 2.7 as the d.viewkeys()
,
d.viewvalues()
and d.viewitems()
methods.
The future.utils
compatibility module also provides
viewkeys()
, viewvalues()
and viewitems()
helper functions
when running on Python 2.7 or Python 3.x.
Migrating directly to Python 3
The 2to3
migration tool handles direct migrations to Python 3 in
accordance with the semantic equivalents described above:
d.keys()
->list(d.keys())
d.values()
->list(d.values())
d.items()
->list(d.items())
d.iterkeys()
->iter(d.keys())
d.itervalues()
->iter(d.values())
d.iteritems()
->iter(d.items())
d.viewkeys()
->d.keys()
d.viewvalues()
->d.values()
d.viewitems()
->d.items()
Rather than 9 distinct mapping methods for iteration, there are now only the
3 view methods, which combine in straightforward ways with the two relevant
builtin functions to cover all of the behaviours that are available as
dict
methods in Python 2.7.
Note that in many cases d.keys()
can be replaced by just d
, but the
2to3
migration tool doesn’t attempt that replacement.
The 2to3
migration tool also does not provide any automatic assistance
for migrating references to these objects as bound or unbound methods - it
only automates conversions where the API is called immediately.
Migrating to the common subset of Python 2 and 3
When migrating to the common subset of Python 2 and 3, the above transformations are not generally appropriate, as they all either result in the creation of a redundant list in Python 2, have unexpectedly different semantics in at least some cases, or both.
Since most code running in the common subset of Python 2 and 3 supports at least as far back as Python 2.6, the currently recommended approach to conversion of mapping iteration operation depends on two helper functions for efficient iteration over mapping values and mapping item tuples:
d.keys()
->list(d)
d.values()
->list(itervalues(d))
d.items()
->list(iteritems(d))
d.iterkeys()
->iter(d)
d.itervalues()
->itervalues(d)
d.iteritems()
->iteritems(d)
Both six
and future.utils
provide appropriate definitions of
itervalues()
and iteritems()
(along with essentially redundant
definitions of iterkeys()
). Creating your own definitions of these
functions in a custom compatibility module is also relatively
straightforward:
try:
dict.iteritems
except AttributeError:
# Python 3
def itervalues(d):
return iter(d.values())
def iteritems(d):
return iter(d.items())
else:
# Python 2
def itervalues(d):
return d.itervalues()
def iteritems(d):
return d.iteritems()
The greatest loss of readability currently arises when converting code that
actually needs the list based snapshots that were the default in Python
2. This readability loss could likely be mitigated by also providing
listvalues
and listitems
helper functions, allowing the affected
conversions to be simplified to:
d.values()
->listvalues(d)
d.items()
->listitems(d)
The corresponding compatibility function definitions are as straightforward as their iterator counterparts:
try:
dict.iteritems
except AttributeError:
# Python 3
def listvalues(d):
return list(d.values())
def listitems(d):
return list(d.items())
else:
# Python 2
def listvalues(d):
return d.values()
def listitems(d):
return d.items()
With that expanded set of compatibility functions, Python 2 code would then be converted to “idiomatic” hybrid 2/3 code as:
d.keys()
->list(d)
d.values()
->listvalues(d)
d.items()
->listitems(d)
d.iterkeys()
->iter(d)
d.itervalues()
->itervalues(d)
d.iteritems()
->iteritems(d)
This compares well for readability with the idiomatic pure Python 3 code that uses the mapping methods and builtins directly:
d.keys()
->list(d)
d.values()
->list(d.values())
d.items()
->list(d.items())
d.iterkeys()
->iter(d)
d.itervalues()
->iter(d.values())
d.iteritems()
->iter(d.items())
It’s also notable that when using this approach, hybrid code would never invoke the mapping methods directly: it would always invoke either a builtin or helper function instead, in order to ensure the exact same semantics on both Python 2 and 3.
Migrating from Python 3 to the common subset with Python 2.7
While the majority of migrations are currently from Python 2 either directly to Python 3 or to the common subset of Python 2 and Python 3, there are also some migrations of newer projects that start in Python 3 and then later add Python 2 support, either due to user demand, or to gain access to Python 2 libraries that are not yet available in Python 3 (and porting them to Python 3 or creating a Python 3 compatible replacement is not a trivial exercise).
In these cases, Python 2.7 compatibility is often sufficient, and the 2.7+
only view based helper functions provided by future.utils
allow the bare
accesses to the Python 3 mapping view methods to be replaced with code that
is compatible with both Python 2.7 and Python 3 (note, this is the only
migration chart in the PEP that has Python 3 code on the left of the
conversion):
d.keys()
->viewkeys(d)
d.values()
->viewvalues(d)
d.items()
->viewitems(d)
list(d.keys())
->list(d)
list(d.values())
->listvalues(d)
list(d.items())
->listitems(d)
iter(d.keys())
->iter(d)
iter(d.values())
->itervalues(d)
iter(d.items())
->iteritems(d)
As with migrations from Python 2 to the common subset, note that the hybrid code ends up never invoking the mapping methods directly - it only calls builtins and helper methods, with the latter addressing the semantic differences between Python 2 and Python 3.
Possible changes to Python 3.5+
The main proposal put forward to potentially aid migration of existing Python 2 code to Python 3 is the restoration of some or all of the alternate iteration APIs to the Python 3 mapping API. In particular, the initial draft of this PEP proposed making the following conversions possible when migrating to the common subset of Python 2 and Python 3.5+:
d.keys()
->list(d)
d.values()
->list(d.itervalues())
d.items()
->list(d.iteritems())
d.iterkeys()
->d.iterkeys()
d.itervalues()
->d.itervalues()
d.iteritems()
->d.iteritems()
Possible mitigations of the additional language complexity in Python 3
created by restoring these methods included immediately deprecating them,
as well as potentially hiding them from the dir()
function (or perhaps
even defining a way to make pydoc
aware of function deprecations).
However, in the case where the list output is actually desired, the end result of that proposal is actually less readable than an appropriately defined helper function, and the function and method forms of the iterator versions are pretty much equivalent from a readability perspective.
So unless I’ve missed something critical, readily available listvalues()
and listitems()
helper functions look like they will improve the
readability of hybrid code more than anything we could add back to the
Python 3.5+ mapping API, and won’t have any long-term impact on the
complexity of Python 3 itself.
Discussion
The fact that 5 years in to the Python 3 migration we still have users considering the dict API changes a significant barrier to migration suggests that there are problems with previously recommended approaches. This PEP attempts to explore those issues and tries to isolate those cases where previous advice (such as it was) could prove problematic.
My assessment (largely based on feedback from Twisted devs) is that
problems are most likely to arise when attempting to use d.keys()
,
d.values()
, and d.items()
in hybrid code. While superficially it
seems as though there should be cases where it is safe to ignore the
semantic differences, in practice, the change from “mutable snapshot” to
“dynamic view” is significant enough that it is likely better
to just force the use of either list or iterator semantics for hybrid code,
and leave the use of the view semantics to pure Python 3 code.
This approach also creates rules that are simple enough and safe enough that
it should be possible to automate them in code modernisation scripts that
target the common subset of Python 2 and Python 3, just as 2to3
converts
them automatically when targeting pure Python 3 code.
Acknowledgements
Thanks to the folks at the Twisted sprint table at PyCon for a very vigorous discussion of this idea (and several other topics), and especially to Hynek Schlawack for acting as a moderator when things got a little too heated :)
Thanks also to JP Calderone and Itamar Turner-Trauring for their email feedback, as well to the participants in the python-dev review of the initial version of the PEP.
Copyright
This document has been placed in the public domain.
Source: https://github.com/python-discord/peps/blob/main/pep-0469.txt
Last modified: 2022-03-09 16:04:44 GMT