PEP 501 – General purpose string interpolation
- PEP
- 501
- Title
- General purpose string interpolation
- Author
- Nick Coghlan <ncoghlan at gmail.com>
- Status
- Deferred
- Type
- Standards Track
- Requires
- 498
- Created
- 08-Aug-2015
- Python-Version
- 3.6
- Post-History
- 08-Aug-2015, 23-Aug-2015, 30-Aug-2015
Abstract
PEP 498 proposes new syntactic support for string interpolation that is transparent to the compiler, allow name references from the interpolation operation full access to containing namespaces (as with any other expression), rather than being limited to explicit name references. These are referred to in the PEP as “f-strings” (a mnemonic for “formatted strings”).
However, it only offers this capability for string formatting, making it likely we will see code like the following:
os.system(f"echo {message_from_user}")
This kind of code is superficially elegant, but poses a significant problem
if the interpolated value message_from_user
is in fact provided by an
untrusted user: it’s an opening for a form of code injection attack, where
the supplied user data has not been properly escaped before being passed to
the os.system
call.
To address that problem (and a number of other concerns), this PEP proposes
the complementary introduction of “i-strings” (a mnemonic for “interpolation
template strings”), where f"Message with {data}"
would produce the same
result as format(i"Message with {data}")
.
Some possible examples of the proposed syntax:
mycommand = sh(i"cat {filename}")
myquery = sql(i"SELECT {column} FROM {table};")
myresponse = html(i"<html><body>{response.body}</body></html>")
logging.debug(i"Message with {detailed} {debugging} {info}")
PEP Deferral
This PEP is currently deferred pending further experience with PEP 498’s simpler approach of only supporting eager rendering without the additional complexity of also supporting deferred rendering.
Summary of differences from PEP 498
The key additions this proposal makes relative to PEP 498:
- the “i” (interpolation template) prefix indicates delayed rendering, but otherwise uses the same syntax and semantics as formatted strings
- interpolation templates are available at runtime as a new kind of object
(
types.InterpolationTemplate
) - the default rendering used by formatted strings is invoked on an
interpolation template object by calling
format(template)
rather than implicitly - while f-string
f"Message {here}"
would be semantically equivalent toformat(i"Message {here}")
, it is expected that the explicit syntax would avoid the runtime overhead of using the delayed rendering machinery
NOTE: This proposal spells out a draft API for types.InterpolationTemplate
.
The precise details of the structures and methods exposed by this type would
be informed by the reference implementation of PEP 498, so it makes sense to
gain experience with that as an internal API before locking down a public API
(if this extension proposal is accepted).
Proposal
This PEP proposes the introduction of a new string prefix that declares the string to be an interpolation template rather than an ordinary string:
template = i"Substitute {names} and {expressions()} at runtime"
This would be effectively interpreted as:
_raw_template = "Substitute {names} and {expressions()} at runtime"
_parsed_template = (
("Substitute ", "names"),
(" and ", "expressions()"),
(" at runtime", None),
)
_field_values = (names, expressions())
_format_specifiers = (f"", f"")
template = types.InterpolationTemplate(_raw_template,
_parsed_template,
_field_values,
_format_specifiers)
The __format__
method on types.InterpolationTemplate
would then
implement the following str.format
inspired semantics:
>>> import datetime
>>> name = 'Jane'
>>> age = 50
>>> anniversary = datetime.date(1991, 10, 12)
>>> format(i'My name is {name}, my age next year is {age+1}, my anniversary is {anniversary:%A, %B %d, %Y}.')
'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
>>> format(i'She said her name is {repr(name)}.')
"She said her name is 'Jane'."
As with formatted strings, the interpolation template prefix can be combined with single-quoted, double-quoted and triple quoted strings, including raw strings. It does not support combination with bytes literals.
Similarly, this PEP does not propose to remove or deprecate any of the existing string formatting mechanisms, as those will remain valuable when formatting strings that are not present directly in the source code of the application.
Rationale
PEP 498 makes interpolating values into strings with full access to Python’s lexical namespace semantics simpler, but it does so at the cost of creating a situation where interpolating values into sensitive targets like SQL queries, shell commands and HTML templates will enjoy a much cleaner syntax when handled without regard for code injection attacks than when they are handled correctly.
This PEP proposes to provide the option of delaying the actual rendering
of an interpolation template to its __format__
method, allowing the use of
other template renderers by passing the template around as a first class object.
While very different in the technical details, the
types.InterpolationTemplate
interface proposed in this PEP is
conceptually quite similar to the FormattableString
type underlying the
native interpolation support introduced in C# 6.0.
Specification
This PEP proposes the introduction of i
as a new string prefix that
results in the creation of an instance of a new type,
types.InterpolationTemplate
.
Interpolation template literals are Unicode strings (bytes literals are not permitted), and string literal concatenation operates as normal, with the entire combined literal forming the interpolation template.
The template string is parsed into literals, expressions and format specifiers as described for f-strings in PEP 498. Conversion specifiers are handled by the compiler, and appear as part of the field text in interpolation templates.
However, rather than being rendered directly into a formatted strings, these components are instead organised into an instance of a new type with the following semantics:
class InterpolationTemplate:
__slots__ = ("raw_template", "parsed_template",
"field_values", "format_specifiers")
def __new__(cls, raw_template, parsed_template,
field_values, format_specifiers):
self = super().__new__(cls)
self.raw_template = raw_template
self.parsed_template = parsed_template
self.field_values = field_values
self.format_specifiers = format_specifiers
return self
def __repr__(self):
return (f"<{type(self).__qualname__} {repr(self._raw_template)} "
f"at {id(self):#x}>")
def __format__(self, format_specifier):
# When formatted, render to a string, and use string formatting
return format(self.render(), format_specifier)
def render(self, *, render_template=''.join,
render_field=format):
# See definition of the template rendering semantics below
The result of an interpolation template expression is an instance of this
type, rather than an already rendered string - rendering only takes
place when the instance’s render
method is called (either directly, or
indirectly via __format__
).
The compiler will pass the following details to the interpolation template for later use:
- a string containing the raw template as written in the source code
- a parsed template tuple that allows the renderer to render the template without needing to reparse the raw string template for substitution fields
- a tuple containing the evaluated field values, in field substitution order
- a tuple containing the field format specifiers, in field substitution order
This structure is designed to take full advantage of compile time constant folding by ensuring the parsed template is always constant, even when the field values and format specifiers include variable substitution expressions.
The raw template is just the interpolation template as a string. By default, it is used to provide a human readable representation for the interpolation template.
The parsed template consists of a tuple of 2-tuples, with each 2-tuple containing the following fields:
leading_text
: a leading string literal. This will be the empty string if the current field is at the start of the string, or immediately follows the preceding field.field_expr
: the text of the expression element in the substitution field. This will be None for a final trailing text segment.
The tuple of evaluated field values holds the results of evaluating the substitution expressions in the scope where the interpolation template appears.
The tuple of field specifiers holds the results of evaluating the field specifiers as f-strings in the scope where the interpolation template appears.
The InterpolationTemplate.render
implementation then defines the rendering
process in terms of the following renderers:
- an overall
render_template
operation that defines how the sequence of literal template sections and rendered fields are composed into a fully rendered result. The default template renderer is string concatenation using''.join
. - a per field
render_field
operation that receives the field value and format specifier for substitution fields within the template. The default field renderer is theformat
builtin.
Given an appropriate parsed template representation and internal methods of iterating over it, the semantics of template rendering would then be equivalent to the following:
def render(self, *, render_template=''.join,
render_field=format):
iter_fields = enumerate(self.parsed_template)
values = self.field_values
specifiers = self.format_specifiers
template_parts = []
for field_pos, (leading_text, field_expr) in iter_fields:
template_parts.append(leading_text)
if field_expr is not None:
value = values[field_pos]
specifier = specifiers[field_pos]
rendered_field = render_field(value, specifier)
template_parts.append(rendered_field)
return render_template(template_parts)
Conversion specifiers
NOTE:
Appropriate handling of conversion specifiers is currently an open question. Exposing them more directly to custom renderers would increase the complexity of theInterpolationTemplate
definition without providing an increase in expressiveness (since they’re redundant with calling the builtins directly). At the same time, they are made available as arbitrary strings when writing customstring.Formatter
implementations, so it may be desirable to offer similar levels of flexibility of interpretation in interpolation templates.
The !a
, !r
and !s
conversion specifiers supported by str.format
and hence PEP 498 are handled in interpolation templates as follows:
- they’re included unmodified in the raw template to ensure no information is lost
- they’re replaced in the parsed template with the corresponding builtin
calls, in order to ensure that
field_expr
always contains a valid Python expression - the corresponding field value placed in the field values tuple is converted appropriately before being passed to the interpolation template
This means that, for most purposes, the difference between the use of conversion specifiers and calling the corresponding builtins in the original interpolation template will be transparent to custom renderers. The difference will only be apparent if reparsing the raw template, or attempting to reconstruct the original template from the parsed template.
Writing custom renderers
Writing a custom renderer doesn’t requiring any special syntax. Instead,
custom renderers are ordinary callables that process an interpolation
template directly either by calling the render()
method with alternate render_template
or render_field
implementations, or by accessing the
template’s data attributes directly.
For example, the following function would render a template using objects’
repr
implementations rather than their native formatting support:
def reprformat(template):
def render_field(value, specifier):
return format(repr(value), specifier)
return template.render(render_field=render_field)
When writing custom renderers, note that the return type of the overall
rendering operation is determined by the return type of the passed in render_template
callable. While this is expected to be a string in most
cases, producing non-string objects is permitted. For example, a custom
template renderer could involve an sqlalchemy.sql.text
call that produces
an SQL Alchemy query object.
Non-strings may also be returned from render_field
, as long as it is paired
with a render_template
implementation that expects that behaviour.
Expression evaluation
As with f-strings, the subexpressions that are extracted from the interpolation
template are evaluated in the context where the interpolation template
appears. This means the expression has full access to local, nonlocal and global variables. Any valid Python expression can be used inside {}
, including
function and method calls.
Because the substitution expressions are evaluated where the string appears in the source code, there are no additional security concerns related to the contents of the expression itself, as you could have also just written the same expression and used runtime field parsing:
>>> bar=10
>>> def foo(data):
... return data + 20
...
>>> str(i'input={bar}, output={foo(bar)}')
'input=10, output=30'
Is essentially equivalent to:
>>> 'input={}, output={}'.format(bar, foo(bar))
'input=10, output=30'
Handling code injection attacks
The PEP 498 formatted string syntax makes it potentially attractive to write code like the following:
runquery(f"SELECT {column} FROM {table};")
runcommand(f"cat {filename}")
return_response(f"<html><body>{response.body}</body></html>")
These all represent potential vectors for code injection attacks, if any of the variables being interpolated happen to come from an untrusted source. The specific proposal in this PEP is designed to make it straightforward to write use case specific renderers that take care of quoting interpolated values appropriately for the relevant security context:
runquery(sql(i"SELECT {column} FROM {table};"))
runcommand(sh(i"cat {filename}"))
return_response(html(i"<html><body>{response.body}</body></html>"))
This PEP does not cover adding such renderers to the standard library immediately, but rather proposes to ensure that they can be readily provided by third party libraries, and potentially incorporated into the standard library at a later date.
For example, a renderer that aimed to offer a POSIX shell style experience for
accessing external programs, without the significant risks posed by running
os.system
or enabling the system shell when using the subprocess
module
APIs, might provide an interface for running external programs similar to that
offered by the
Julia programming language,
only with the backtick based \`cat $filename\`
syntax replaced by
i"cat {filename}"
style interpolation templates.
Format specifiers
Aside from separating them out from the substitution expression during parsing, format specifiers are otherwise treated as opaque strings by the interpolation template parser - assigning semantics to those (or, alternatively, prohibiting their use) is handled at runtime by the field renderer.
Error handling
Either compile time or run time errors can occur when processing interpolation expressions. Compile time errors are limited to those errors that can be detected when parsing a template string into its component tuples. These errors all raise SyntaxError.
Unmatched braces:
>>> i'x={x'
File "<stdin>", line 1
SyntaxError: missing '}' in interpolation expression
Invalid expressions:
>>> i'x={!x}'
File "<fstring>", line 1
!x
^
SyntaxError: invalid syntax
Run time errors occur when evaluating the expressions inside a template string before creating the interpolation template object. See PEP 498 for some examples.
Different renderers may also impose additional runtime constraints on acceptable interpolated expressions and other formatting details, which will be reported as runtime exceptions.
Possible integration with the logging module
One of the challenges with the logging module has been that we have previously been unable to devise a reasonable migration strategy away from the use of printf-style formatting. The runtime parsing and interpolation overhead for logging messages also poses a problem for extensive logging of runtime events for monitoring purposes.
While beyond the scope of this initial PEP, interpolation template support could potentially be added to the logging module’s event reporting APIs, permitting relevant details to be captured using forms like:
logging.debug(i"Event: {event}; Details: {data}")
logging.critical(i"Error: {error}; Details: {data}")
Rather than the current mod-formatting style:
logging.debug("Event: %s; Details: %s", event, data)
logging.critical("Error: %s; Details: %s", event, data)
As the interpolation template is passed in as an ordinary argument, other keyword arguments would also remain available:
logging.critical(i"Error: {error}; Details: {data}", exc_info=True)
As part of any such integration, a recommended approach would need to be
defined for “lazy evaluation” of interpolated fields, as the logging
module’s existing delayed interpolation support provides access to
various attributes of the event LogRecord
instance.
For example, since interpolation expressions are arbitrary Python expressions, string literals could be used to indicate cases where evaluation itself is being deferred, not just rendering:
logging.debug(i"Logger: {'record.name'}; Event: {event}; Details: {data}")
This could be further extended with idioms like using inline tuples to indicate deferred function calls to be made only if the log message is actually going to be rendered at current logging levels:
logging.debug(i"Event: {event}; Details: {expensive_call, raw_data}")
This kind of approach would be possible as having access to the actual text of the field expression would allow the logging renderer to distinguish between inline tuples that appear in the field expression itself, and tuples that happen to be passed in as data values in a normal field.
Discussion
Refer to PEP 498 for additional discussion, as several of the points there also apply to this PEP.
Deferring support for binary interpolation
Supporting binary interpolation with this syntax would be relatively straightforward (the elements in the parsed fields tuple would just be byte strings rather than text strings, and the default renderer would be markedly less useful), but poses a significant likelihood of producing confusing type errors when a text renderer was presented with binary input.
Since the proposed syntax is useful without binary interpolation support, and such support can be readily added later, further consideration of binary interpolation is considered out of scope for the current PEP.
Interoperability with str-only interfaces
For interoperability with interfaces that only accept strings, interpolation
templates can still be prerendered with format
, rather than delegating the
rendering to the called function.
This reflects the key difference from PEP 498, which always eagerly applies the default rendering, without any way to delegate the choice of renderer to another section of the code.
Preserving the raw template string
Earlier versions of this PEP failed to make the raw template string available on the interpolation template. Retaining it makes it possible to provide a more attractive template representation, as well as providing the ability to precisely reconstruct the original string, including both the expression text and the details of any eagerly rendered substitution fields in format specifiers.
Creating a rich object rather than a global name lookup
Earlier versions of this PEP used an __interpolate__
builtin, rather than
a creating a new kind of object for later consumption by interpolation
functions. Creating a rich descriptive object with a useful default renderer
made it much easier to support customisation of the semantics of interpolation.
Building atop PEP 498, rather than competing with it
Earlier versions of this PEP attempted to serve as a complete substitute for PEP 498, rather than building a more flexible delayed rendering capability on top of PEP 498’s eager rendering.
Assuming the presence of f-strings as a supporting capability simplified a number of aspects of the proposal in this PEP (such as how to handle substitution fields in format specifiers)
Deferring consideration of possible use in i18n use cases
The initial motivating use case for this PEP was providing a cleaner syntax
for i18n translation, as that requires access to the original unmodified
template. As such, it focused on compatibility with the substitution syntax used
in Python’s string.Template
formatting and Mozilla’s l20n project.
However, subsequent discussion revealed there are significant additional considerations to be taken into account in the i18n use case, which don’t impact the simpler cases of handling interpolation into security sensitive contexts (like HTML, system shells, and database queries), or producing application debugging messages in the preferred language of the development team (rather than the native language of end users).
Due to the original design of the str.format
substitution syntax in PEP 3101
being inspired by C#’s string formatting syntax, the specific field
substitution syntax used in PEP 498 is consistent not only with Python’s own str.format
syntax, but also with string formatting in C#, including the
native “$-string” interpolation syntax introduced in C# 6.0 (released in July
2015). The related IFormattable
interface in C# forms the basis of a
number of elements of C#’s internationalization and localization
support.
This means that while this particular substitution syntax may not
currently be widely used for translation of Python applications (losing out
to traditional %-formatting and the designed-specifically-for-i18n
string.Template
formatting), it is a popular translation format in the
wider software development ecosystem (since it is already the preferred
format for translating C# applications).
Acknowledgements
- Eric V. Smith for creating PEP 498 and demonstrating the feasibility of arbitrary expression substitution in string interpolation
- Barry Warsaw, Armin Ronacher, and Mike Miller for their contributions to exploring the feasibility of using this model of delayed rendering in i18n use cases (even though the ultimate conclusion was that it was a poor fit, at least for current approaches to i18n in Python)
References
- [1]
- %-formatting (https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting)
- [2]
- str.format (https://docs.python.org/3/library/string.html#formatstrings)
- [3]
- string.Template documentation (https://docs.python.org/3/library/string.html#template-strings)
- [4]
- PEP 215: String Interpolation
- [5]
- PEP 292: Simpler String Substitutions
- [6]
- PEP 3101: Advanced String Formatting
- [7]
- PEP 498: Literal string formatting
- [8]
- FormattableString and C# native string interpolation (https://msdn.microsoft.com/en-us/library/dn961160.aspx)
- [9]
- IFormattable interface in C# (see remarks for globalization notes) (https://msdn.microsoft.com/en-us/library/system.iformattable.aspx)
- [10]
- Running external commands in Julia (http://julia.readthedocs.org/en/latest/manual/running-external-programs/)
Copyright
This document has been placed in the public domain.
Source: https://github.com/python-discord/peps/blob/main/pep-0501.txt
Last modified: 2022-01-21 11:03:51 GMT