Python Enhancement Proposals

PEP 511 – API for code transformers

PEP
511
Title
API for code transformers
Author
Victor Stinner <vstinner at python.org>
Status
Rejected
Type
Standards Track
Created
04-Jan-2016
Python-Version
3.6

Contents

Rejection Notice

This PEP was rejected by its author.

This PEP was seen as blessing new Python-like programming languages which are close but incompatible with the regular Python language. It was decided to not promote syntaxes incompatible with Python.

This PEP was also seen as a nice tool to experiment new Python features, but it is already possible to experiment them without the PEP, only with importlib hooks. If a feature becomes useful, it should be directly part of Python, instead of depending on an third party Python module.

Finally, this PEP was driven was the FAT Python optimization project which was abandoned in 2016, since it was not possible to show any significant speedup, but also because of the lack of time to implement the most advanced and complex optimizations.

Abstract

Propose an API to register bytecode and AST transformers. Add also -o OPTIM_TAG command line option to change .pyc filenames, -o noopt disables the peephole optimizer. Raise an ImportError exception on import if the .pyc file is missing and the code transformers required to transform the code are missing. code transformers are not needed code transformed ahead of time (loaded from .pyc files).

Rationale

Python does not provide a standard way to transform the code. Projects transforming the code use various hooks. The MacroPy project uses an import hook: it adds its own module finder in sys.meta_path to hook its AST transformer. Another option is to monkey-patch the builtin compile() function. There are even more options to hook a code transformer.

Python 3.4 added a compile_source() method to importlib.abc.SourceLoader. But code transformation is wider than just importing modules, see described use cases below.

Writing an optimizer or a preprocessor is out of the scope of this PEP.

Usage 1: AST optimizer

Transforming an Abstract Syntax Tree (AST) is a convenient way to implement an optimizer. It’s easier to work on the AST than working on the bytecode, AST contains more information and is more high level.

Since the optimization can done ahead of time, complex but slow optimizations can be implemented.

Example of optimizations which can be implemented with an AST optimizer:

Using guards (see PEP 510), it is possible to implement a much wider choice of optimizations. Examples:

The following issues can be implemented with an AST optimizer:

  • Issue #1346238: A constant folding optimization pass for the AST
  • Issue #2181: optimize out local variables at end of function
  • Issue #2499: Fold unary + and not on constants
  • Issue #4264: Patch: optimize code to use LIST_APPEND instead of calling list.append
  • Issue #7682: Optimisation of if with constant expression
  • Issue #10399: AST Optimization: inlining of function calls
  • Issue #11549: Build-out an AST optimizer, moving some functionality out of the peephole optimizer
  • Issue #17068: peephole optimization for constant strings
  • Issue #17430: missed peephole optimization

Usage 2: Preprocessor

A preprocessor can be easily implemented with an AST transformer. A preprocessor has various and different usages.

Some examples:

MacroPy has a long list of examples and use cases.

This PEP does not add any new code transformer. Using a code transformer will require an external module and to register it manually.

See also PyXfuscator: Python obfuscator, deobfuscator, and user-assisted decompiler.

Usage 3: Disable all optimization

Ned Batchelder asked to add an option to disable the peephole optimizer because it makes code coverage more difficult to implement. See the discussion on the python-ideas mailing list: Disable all peephole optimizations.

This PEP adds a new -o noopt command line option to disable the peephole optimizer. In Python, it’s as easy as:

sys.set_code_transformers([])

It will fix the Issue #2506: Add mechanism to disable optimizations.

Usage 4: Write new bytecode optimizers in Python

Python 3.6 optimizes the code using a peephole optimizer. By definition, a peephole optimizer has a narrow view of the code and so can only implement basic optimizations. The optimizer rewrites the bytecode. It is difficult to enhance it, because it written in C.

With this PEP, it becomes possible to implement a new bytecode optimizer in pure Python and experiment new optimizations.

Some optimizations are easier to implement on the AST like constant folding, but optimizations on the bytecode are still useful. For example, when the AST is compiled to bytecode, useless jumps can be emitted because the compiler is naive and does not try to optimize anything.

Use Cases

This section give examples of use cases explaining when and how code transformers will be used.

Interactive interpreter

It will be possible to use code transformers with the interactive interpreter which is popular in Python and commonly used to demonstrate Python.

The code is transformed at runtime and so the interpreter can be slower when expensive code transformers are used.

Build a transformed package

It will be possible to build a package of the transformed code.

A transformer can have a configuration. The configuration is not stored in the package.

All .pyc files of the package must be transformed with the same code transformers and the same transformers configuration.

It is possible to build different .pyc files using different optimizer tags. Example: fat for the default configuration and fat_inline for a different configuration with function inlining enabled.

A package can contain .pyc files with different optimizer tags.

Install a package containing transformed .pyc files

It will be possible to install a package which contains transformed .pyc files.

All .pyc files with any optimizer tag contained in the package are installed, not only for the current optimizer tag.

Build .pyc files when installing a package

If a package does not contain any .pyc files of the current optimizer tag (or some .pyc files are missing), the .pyc are created during the installation.

Code transformers of the optimizer tag are required. Otherwise, the installation fails with an error.

Execute transformed code

It will be possible to execute transformed code.

Raise an ImportError exception on import if the .pyc file of the current optimizer tag is missing and the code transformers required to transform the code are missing.

The interesting point here is that code transformers are not needed to execute the transformed code if all required .pyc files are already available.

Code transformer API

A code transformer is a class with ast_transformer() and/or code_transformer() methods (API described below) and a name attribute.

For efficiency, do not define a code_transformer() or ast_transformer() method if it does nothing.

The name attribute (str) must be a short string used to identify an optimizer. It is used to build a .pyc filename. The name must not contain dots ('.'), dashes ('-') or directory separators: dots are used to separated fields in a .pyc filename and dashes areused to join code transformer names to build the optimizer tag.

Note

It would be nice to pass the fully qualified name of a module in the context when an AST transformer is used to transform a module on import, but it looks like the information is not available in PyParser_ASTFromStringObject().

code_transformer() method

Prototype:

def code_transformer(self, code, context):
    ...
    new_code = ...
    ...
    return new_code

Parameters:

  • code: code object
  • context: an object with an optimize attribute (int), the optimization level (0, 1 or 2). The value of the optimize attribute comes from the optimize parameter of the compile() function, it is equal to sys.flags.optimize by default.

Each implementation of Python can add extra attributes to context. For example, on CPython, context will also have the following attribute:

  • interactive (bool): true if in interactive mode

XXX add more flags?

XXX replace flags int with a sub-namespace, or with specific attributes?

The method must return a code object.

The code transformer is run after the compilation to bytecode

ast_transformer() method

Prototype:

def ast_transformer(self, tree, context):
    ...
    return tree

Parameters:

  • tree: an AST tree
  • context: an object with a filename attribute (str)

It must return an AST tree. It can modify the AST tree in place, or create a new AST tree.

The AST transformer is called after the creation of the AST by the parser and before the compilation to bytecode. New attributes may be added to context in the future.

Changes

In short, add:

  • -o OPTIM_TAG command line option
  • sys.implementation.optim_tag
  • sys.get_code_transformers()
  • sys.set_code_transformers(transformers)
  • ast.PyCF_TRANSFORMED_AST

API to get/set code transformers

Add new functions to register code transformers:

  • sys.set_code_transformers(transformers): set the list of code transformers and update sys.implementation.optim_tag
  • sys.get_code_transformers(): get the list of code transformers.

The order of code transformers matter. Running transformer A and then transformer B can give a different output than running transformer B an then transformer A.

Example to prepend a new code transformer:

transformers = sys.get_code_transformers()
transformers.insert(0, new_cool_transformer)
sys.set_code_transformers(transformers)

All AST transformers are run sequentially (ex: the second transformer gets the input of the first transformer), and then all bytecode transformers are run sequentially.

Optimizer tag

Changes:

  • Add sys.implementation.optim_tag (str): optimization tag. The default optimization tag is 'opt'.
  • Add a new -o OPTIM_TAG command line option to set sys.implementation.optim_tag.

Changes on importlib:

  • importlib uses sys.implementation.optim_tag to build the .pyc filename to importing modules, instead of always using opt. Remove also the special case for the optimizer level 0 with the default optimizer tag 'opt' to simplify the code.
  • When loading a module, if the .pyc file is missing but the .py is available, the .py is only used if code optimizers have the same optimizer tag than the current tag, otherwise an ImportError exception is raised.

Pseudo-code of a use_py() function to decide if a .py file can be compiled to import a module:

def transformers_tag():
    transformers = sys.get_code_transformers()
    if not transformers:
        return 'noopt'
    return '-'.join(transformer.name
                    for transformer in transformers)

def use_py():
    return (transformers_tag() == sys.implementation.optim_tag)

The order of sys.get_code_transformers() matter. For example, the fat transformer followed by the pythran transformer gives the optimizer tag fat-pythran.

The behaviour of the importlib module is unchanged with the default optimizer tag ('opt').

Peephole optimizer

By default, sys.implementation.optim_tag is opt and sys.get_code_transformers() returns a list of one code transformer: the peephole optimizer (optimize the bytecode).

Use -o noopt to disable the peephole optimizer. In this case, the optimizer tag is noopt and no code transformer is registered.

Using the -o opt option has not effect.

AST enhancements

Enhancements to simplify the implementation of AST transformers:

  • Add a new compiler flag PyCF_TRANSFORMED_AST to get the transformed AST. PyCF_ONLY_AST returns the AST before the transformers.

Examples

.pyc filenames

Example of .pyc filenames of the os module.

With the default optimizer tag 'opt':

.pyc filename Optimization level
os.cpython-36.opt-0.pyc 0
os.cpython-36.opt-1.pyc 1
os.cpython-36.opt-2.pyc 2

With the 'fat' optimizer tag:

.pyc filename Optimization level
os.cpython-36.fat-0.pyc 0
os.cpython-36.fat-1.pyc 1
os.cpython-36.fat-2.pyc 2

Bytecode transformer

Scary bytecode transformer replacing all strings with "Ni! Ni! Ni!":

import sys
import types

class BytecodeTransformer:
    name = "knights_who_say_ni"

    def code_transformer(self, code, context):
        consts = ['Ni! Ni! Ni!' if isinstance(const, str) else const
                  for const in code.co_consts]
        return types.CodeType(code.co_argcount,
                              code.co_kwonlyargcount,
                              code.co_nlocals,
                              code.co_stacksize,
                              code.co_flags,
                              code.co_code,
                              tuple(consts),
                              code.co_names,
                              code.co_varnames,
                              code.co_filename,
                              code.co_name,
                              code.co_firstlineno,
                              code.co_lnotab,
                              code.co_freevars,
                              code.co_cellvars)

# replace existing code transformers with the new bytecode transformer
sys.set_code_transformers([BytecodeTransformer()])

# execute code which will be transformed by code_transformer()
exec("print('Hello World!')")

Output:

Ni! Ni! Ni!

AST transformer

Similarly to the bytecode transformer example, the AST transformer also replaces all strings with "Ni! Ni! Ni!":

import ast
import sys

class KnightsWhoSayNi(ast.NodeTransformer):
    def visit_Str(self, node):
        node.s = 'Ni! Ni! Ni!'
        return node

class ASTTransformer:
    name = "knights_who_say_ni"

    def __init__(self):
        self.transformer = KnightsWhoSayNi()

    def ast_transformer(self, tree, context):
        self.transformer.visit(tree)
        return tree

# replace existing code transformers with the new AST transformer
sys.set_code_transformers([ASTTransformer()])

# execute code which will be transformed by ast_transformer()
exec("print('Hello World!')")

Output:

Ni! Ni! Ni!

Other Python implementations

The PEP 511 should be implemented by all Python implementation, but the bytecode and the AST are not standardized.

By the way, even between minor version of CPython, there are changes on the AST API. There are differences, but only minor differences. It is quite easy to write an AST transformer which works on Python 2.7 and Python 3.5 for example.

Discussion

Prior Art

AST optimizers

The Issue #17515 “Add sys.setasthook() to allow to use a custom AST” optimizer was a first attempt of API for code transformers, but specific to AST.

In 2015, Victor Stinner wrote the fatoptimizer project, an AST optimizer specializing functions using guards.

In 2014, Kevin Conway created the PyCC optimizer.

In 2012, Victor Stinner wrote the astoptimizer project, an AST optimizer implementing various optimizations. Most interesting optimizations break the Python semantics since no guard is used to disable optimization if something changes.

In 2011, Eugene Toder proposed to rewrite some peephole optimizations in a new AST optimizer: issue #11549, Build-out an AST optimizer, moving some functionality out of the peephole optimizer. The patch adds ast.Lit (it was proposed to rename it to ast.Literal).

Python Preprocessors

  • MacroPy: MacroPy is an implementation of Syntactic Macros in the Python Programming Language. MacroPy provides a mechanism for user-defined functions (macros) to perform transformations on the abstract syntax tree (AST) of a Python program at import time.
  • pypreprocessor: C-style preprocessor directives in Python, like #define and #ifdef

Bytecode transformers

  • codetransformer: Bytecode transformers for CPython inspired by the ast module’s NodeTransformer.
  • byteplay: Byteplay lets you convert Python code objects into equivalent objects which are easy to play with, and lets you convert those objects back into living Python code objects. It’s useful for applying crazy transformations on Python functions, and is also useful in learning Python byte code intricacies. See byteplay documentation.

See also:


Source: https://github.com/python-discord/peps/blob/main/pep-0511.txt

Last modified: 2022-01-21 11:03:51 GMT