PEP 657 – Include Fine Grained Error Locations in Tracebacks

PEP: 657
Title: Include Fine Grained Error Locations in Tracebacks
Author: Pablo Galindo <pablogsal at python.org>, Batuhan Taskaya <batuhan at python.org>, Ammar Askar <ammar at ammaraskar.com>
Discussions-To: https://discuss.python.org/t/pep-657-include-fine-grained-error-locations-in-tracebacks/8629
Status: Final
Type: Standards Track
Created: 08-May-2021
Python-Version: 3.11
Post-History

Contents

Abstract
Motivation
Rationale
Specification
Backwards Compatibility
Reference Implementation
Rejected Ideas
Acknowledgments
Copyright

Abstract

This PEP proposes adding a mapping from each bytecode instruction to the start and end column offsets of the line that generated them as well as the end line number. This data will be used to improve tracebacks displayed by the CPython interpreter in order to improve the debugging experience. The PEP also proposes adding APIs that allow other tools (such as coverage analysis tools, profilers, tracers, debuggers) to consume this information from code objects.

Motivation

The primary motivation for this PEP is to improve the feedback presented about the location of errors to aid with debugging.

Python currently keeps a mapping of bytecode to line numbers from compilation. The interpreter uses this mapping to point to the source line associated with an error. While this line-level granularity for instructions is useful, a single line of Python code can compile into dozens of bytecode operations making it hard to track which part of the line caused the error.

Consider the following line of Python code:

x['a']['b']['c']['d'] = 1

If any of the values in the dictionaries are None, the error shown is:

Traceback (most recent call last):
  File "test.py", line 2, in <module>
    x['a']['b']['c']['d'] = 1
TypeError: 'NoneType' object is not subscriptable

From the traceback, it is impossible to determine which one of the dictionaries had the None element that caused the error. Users often have to attach a debugger or split up their expression to track down the problem.

However, if the interpreter had a mapping of bytecode to column offsets as well as line numbers, it could helpfully display:

Traceback (most recent call last):
  File "test.py", line 2, in <module>
    x['a']['b']['c']['d'] = 1
    ~~~~~~~~~~~^^^^^
TypeError: 'NoneType' object is not subscriptable

indicating to the user that the object x['a']['b'] must have been None. This highlighting will occur for every frame in the traceback. For instance, if a similar error is part of a complex function call chain, the traceback would display the code associated to the current instruction in every frame:

Traceback (most recent call last):
  File "test.py", line 14, in <module>
    lel3(x)
    ^^^^^^^
  File "test.py", line 12, in lel3
    return lel2(x) / 23
           ^^^^^^^
  File "test.py", line 9, in lel2
    return 25 + lel(x) + lel(x)
                ^^^^^^
  File "test.py", line 6, in lel
    return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
                         ~~~~~~~~~~~~~~~~^^^^^
TypeError: 'NoneType' object is not subscriptable

This problem presents itself in the following situations.

When passing down multiple objects to function calls while accessing the same attribute in them. For instance, this error:

Traceback (most recent call last):
  File "test.py", line 19, in <module>
    foo(a.name, b.name, c.name)
AttributeError: 'NoneType' object has no attribute 'name'

With the improvements in this PEP this would show:

Traceback (most recent call last):
  File "test.py", line 17, in <module>
    foo(a.name, b.name, c.name)
                ^^^^^^
AttributeError: 'NoneType' object has no attribute 'name'

When dealing with lines with complex mathematical expressions, especially with libraries such as numpy where arithmetic operations can fail based on the arguments. For example:

Traceback (most recent call last):
  File "test.py", line 1, in <module>
    x = (a + b) @ (c + d)
  ValueError: operands could not be broadcast together with shapes (1,2) (2,3)

There is no clear indication as to which operation failed, was it the addition on the left, the right or the matrix multiplication in the middle? With this PEP the new error message would look like:

Traceback (most recent call last):
  File "test.py", line 1, in <module>
    x = (a + b) @ (c + d)
                   ~~^~~
  ValueError: operands could not be broadcast together with shapes (1,2) (2,3)

Giving a much clearer and easier to debug error message.

Debugging aside, this extra information would also be useful for code coverage tools, enabling them to measure expression-level coverage instead of just line-level coverage. For instance, given the following line:

x = foo() if bar() else baz()

coverage, profile or state analysis tools will highlight the full line in both branches, making it impossible to differentiate what branch was taken. This is a known problem in pycoverage.

Similar efforts to this PEP have taken place in other languages such as Java in the form of JEP358. NullPointerExceptions in Java were similarly nebulous when it came to lines with complicated expressions. A NullPointerException would provide very little aid in finding the root cause of an error. The implementation for JEP358 is fairly complex, requiring walking back through the bytecode by using a control flow graph analyzer and decompilation techniques to recover the source code that led to the null pointer. Although the complexity of this solution is high and requires maintenance for the decompiler every time Java bytecode is changed, this improvement was deemed to be worth it for the extra information provided for just one exception type.

Rationale

In order to identify the range of source code being executed when exceptions are raised, this proposal requires adding new data for every bytecode instruction. This will have an impact on the size of pyc files on disk and the size of code objects in memory. The authors of this proposal have chosen the data types in a way that tries to minimize this impact. The proposed overhead is storing two uint8_t (one for the start offset and one for the end offset) and the end line information for every bytecode instruction (in the same encoded fashion as the start line is stored currently).

As an illustrative example to gauge the impact of this change, we have calculated that including the start and end offsets will increase the size of the standard library’s pyc files by 22% (6MB) from 28.4MB to 34.7MB. The overhead in memory usage will be the same (assuming the full standard library is loaded into the same program). We believe that this is a very acceptable number since the order of magnitude of the overhead is very small, especially considering the storage size and memory capabilities of modern computers. Additionally, in general the memory size of a Python program is not dominated by code objects. To check this assumption we have executed the test suite of several popular PyPI projects (including NumPy, pytest, Django and Cython) as well as several applications (Black, pylint, mypy executed over either mypy or the standard library) and we found that code objects represent normally 3-6% of the average memory size of the program.

We understand that the extra cost of this information may not be acceptable for some users, so we propose an opt-out mechanism which will cause generated code objects to not have the extra information while also allowing pyc files to not include the extra information.

Specification

In order to have enough information to correctly resolve the location within a given line where an error was raised, a map linking bytecode instructions to column offsets (start and end offset) and end line numbers is needed. This is similar in fashion to how line numbers are currently linked to bytecode instructions.

The following changes will be performed as part of the implementation of this PEP:

The offset information will be exposed to Python via a new attribute in the code object class called co_positions that will return a sequence of four-element tuples containing the full location of every instruction (including start line, end line, start column offset and end column offset) or None if the code object was created without the offset information.
One new C-API function:
```
int PyCode_Addr2Location(
    PyCodeObject *co, int addrq,
    int *start_line, int *start_column,
    int *end_line, int *end_column)
```
will be added so the end line, the start column offsets and the end column offset can be obtained given the index of a bytecode instruction. This function will set the values to 0 if the information is not available.

The internal storage, compression and encoding of the information is left as an implementation detail and can be changed at any point as long as the public API remains unchanged.

Offset semantics

These offsets are propagated by the compiler from the ones stored currently in all AST nodes. The output of the public APIs (co_positions and PyCode_Addr2Location) that deal with these attributes use 0-indexed offsets (just like the AST nodes), but the underlying implementation is free to represent the actual data in whatever form they choose to be most efficient. The error code regarding information not available is None for the co_positions() API, and -1 for the PyCode_Addr2Location API. The availability of the information highly depends on whether the offsets fall under the range, as well as the runtime flags for the interpreter configuration.

The AST nodes use int types to store these values. The current implementation, however, utilizes uint8_t types as an implementation detail to minimize storage impact. This decision allows offsets to go from 0 to 255, while offsets bigger than these values will be treated as missing (returning -1 on the PyCode_Addr2Location and None API in the co_positions() API).

As specified previously, the underlying storage of the offsets should be considered an implementation detail, as the public APIs to obtain this values will return either C int types or Python int objects, which allows to implement better compression/encoding in the future if bigger ranges would need to be supported. This PEP proposes to start with this simpler version and defer improvements to future work.

Displaying tracebacks

When displaying tracebacks, the default exception hook will be modified to query this information from the code objects and use it to display a sequence of carets for every displayed line in the traceback if the information is available. For instance:

  File "test.py", line 6, in lel
    return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
                         ~~~~~~~~~~~~~~~~^^^^^
TypeError: 'NoneType' object is not subscriptable

When displaying tracebacks, instruction offsets will be taken from the traceback objects. This makes highlighting exceptions that are re-raised work naturally without the need to store the new information in the stack. For example, for this code:

def foo(x):
    1 + 1/0 + 2

def bar(x):
    try:
        1 + foo(x) + foo(x)
    except Exception as e:
        raise ValueError("oh no!") from e

bar(bar(bar(2)))

The printed traceback would look like this:

Traceback (most recent call last):
  File "test.py", line 6, in bar
    1 + foo(x) + foo(x)
        ^^^^^^
  File "test.py", line 2, in foo
    1 + 1/0 + 2
        ~^~
ZeroDivisionError: division by zero

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "test.py", line 10, in <module>
    bar(bar(bar(2)))
            ^^^^^^
  File "test.py", line 8, in bar
    raise ValueError("oh no!") from e
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: oh no

While this code:

def foo(x):
    1 + 1/0 + 2
def bar(x):
    try:
        1 + foo(x) + foo(x)
    except Exception:
        raise
bar(bar(bar(2)))

Will be displayed as:

Traceback (most recent call last):
  File "test.py", line 10, in <module>
    bar(bar(bar(2)))
            ^^^^^^
  File "test.py", line 6, in bar
    1 + foo(x) + foo(x)
        ^^^^^^
  File "test.py", line 2, in foo
    1 + 1/0 + 2
        ~^~
ZeroDivisionError: division by zero

Maintaining the current behavior, only a single line will be displayed in tracebacks. For instructions that span multiple lines (the end offset and the start offset belong to different lines), the end line number must be inspected to know if the end offset applies to the same line as the starting offset.

Opt-out mechanism

To offer an opt-out mechanism for those users that care about the storage and memory overhead and to allow third party tools and other programs that are currently parsing tracebacks to catch up the following methods will be provided to deactivate this feature:

A new environment variable: PYTHONNODEBUGRANGES.
A new command line option for the dev mode: python -Xno_debug_ranges.

If any of these methods are used, the Python compiler will not populate code objects with the new information (None will be used instead) and any unmarshalled code objects that contain the extra information will have it stripped away and replaced with None). Additionally, the traceback machinery will not show the extended location information even if the information was present. This method allows users to:

Create smaller pyc files by using one of the two methods when said files are created.
Don’t load the extra information from pyc files if those were created with the extra information in the first place.
Deactivate the extra information when displaying tracebacks (the caret characters indicating the location of the error).

Doing this has a very small performance hit as the interpreter state needs to be fetched when code objects are created to look up the configuration. Creating code objects is not a performance sensitive operation so this should not be a concern.

Backwards Compatibility

The change is fully backwards compatible.

Reference Implementation

A reference implementation can be found in the implementation fork.

Rejected Ideas

Use a single caret instead of a range

It has been proposed to use a single caret instead of highlighting the full range when reporting errors as a way to simplify the feature. We have decided to not go this route for the following reasons:

Deriving the location of the caret is not straightforward using the current layout of the AST. This is because the AST nodes only record the start and end line numbers as well as the start and end column offsets. As the AST nodes do not preserve the original tokens (by design) deriving the exact location of some tokens is not possible without extra re-parsing. For instance, currently binary operators have nodes for the operands but the type of the operator is stored in an enumeration so its location cannot be derived from the node (this is just an example of how this problem manifest, and not the only one).
Deriving the ranges from AST nodes greatly simplifies the implementation and reduces a lot the maintenance cost and the possibilities of errors. This is because using the ranges is always possible to do generically for any AST node, while any other custom information would need to be extracted differently from different types of nodes. Given how error-prone getting the locations manually was when this used to be a manual process when generating the AST, we believe that a generic solution is a very important property to pursue.
Storing the information to highlight a single caret will be very limiting for tools such as coverage tools and profilers as well as for tools like IPython and IDEs that want to make use of this new feature. As this message from the author of “friendly-traceback” mentions, the reason is that without the full range (including end lines) these tools will find very difficult to highlight correctly the relevant source code. For instance, for this code:
```
something = foo(a,b,c) if bar(a,b,c) else other(b,c,d)
```
tools (such as coverage reporters) want to be able to highlight the totality of the call that is covered by the executed bytecode (let’s say foo(a,b,c)) and not just a single character. Even if is technically possible to re-parse and re-tokenize the source code to re-construct the information, it is not possible to do this reliably and would result in a much worse user experience.
Many users have reported that a single caret is much harder to read than a full range, and this motivated using ranges to highlight syntax errors, which was very well received. Additionally, it has been noted that users with vision problems can identify the ranges much easily than a single caret character, which we believe is a great advantage of using ranges.

Have a configure flag to opt out

Having a configure flag to opt out of the overhead even when executing Python in non-optimized mode may sound desirable, but it may cause problems when reading pyc files that were created with a version of the interpreter that was not compiled with the flag activated. This can lead to crashes that would be very difficult to debug for regular users and will make different pyc files incompatible between each other. As this pyc could be shipped as part of libraries or applications without the original source, it is also not always possible to force recompilation of said pyc files. For these reasons we have decided to use the -O flag to opt-out of this behaviour.

Lazy loading of column information

One potential solution to reduce the memory usage of this feature is to not load the column information from the pyc file when code is imported. Only if an uncaught exception bubbles up or if a call to the C-API functions is made will the column information be loaded from the pyc file. This is similar to how we only read source lines to display them in the traceback when an exception bubbles up. While this would indeed lower memory usage, it also results in a far more complex implementation requiring changes to the importing machinery to selectively ignore a part of the code object. We consider this an interesting avenue to explore but ultimately we think is out of the scope for this particular PEP. It also means that column information will not be available if the user is not using pyc files or for code objects created dynamically at runtime.

Implement compression

Although it would be possible to implement some form of compression over the pyc files and the new data in code objects, we believe that this is out of the scope of this proposal due to its larger impact (in the case of pyc files) and the fact that we expect column offsets to not compress well due to the lack of patterns in them (in case of the new data in code objects).

Acknowledgments

Thanks to Carl Friedrich Bolz-Tereick for showing an initial prototype of this idea for the Pypy interpreter and for the helpful discussion.

Copyright

This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.

Source: https://github.com/python-discord/peps/blob/main/pep-0657.rst

Last modified: 2021-07-17 02:25:16 GMT