Python Enhancement Proposals

PEP 586 – Literal Types

PEP
586
Title
Literal Types
Author
Michael Lee <michael.lee.0x2a at gmail.com>, Ivan Levkivskyi <levkivskyi at gmail.com>, Jukka Lehtosalo <jukka.lehtosalo at iki.fi>
BDFL-Delegate
Guido van Rossum <guido at python.org>
Discussions-To
typing-sig@python.org
Status
Accepted
Type
Standards Track
Created
14-Mar-2019
Python-Version
3.8
Post-History
14-Mar-2019
Resolution
Typing-SIG

Contents

Abstract

This PEP proposes adding Literal types to the PEP 484 ecosystem. Literal types indicate that some expression has literally a specific value. For example, the following function will accept only expressions that have literally the value “4”:

from typing import Literal

def accepts_only_four(x: Literal[4]) -> None:
    pass

accepts_only_four(4)   # OK
accepts_only_four(19)  # Rejected

Motivation and Rationale

Python has many APIs that return different types depending on the value of some argument provided. For example:

  • open(filename, mode) returns either IO[bytes] or IO[Text] depending on whether the second argument is something like r or rb.
  • subprocess.check_output(...) returns either bytes or text depending on whether the universal_newlines keyword argument is set to True or not.

This pattern is also fairly common in many popular 3rd party libraries. For example, here are just two examples from pandas and numpy respectively:

  • pandas.concat(...) will return either Series or DataFrame depending on whether the axis argument is set to 0 or 1.
  • numpy.unique will return either a single array or a tuple containing anywhere from two to four arrays depending on three boolean flag values.

The typing issue tracker contains some additional examples and discussion.

There is currently no way of expressing the type signatures of these functions: PEP 484 does not include any mechanism for writing signatures where the return type varies depending on the value passed in. Note that this problem persists even if we redesign these APIs to instead accept enums: MyEnum.FOO and MyEnum.BAR are both considered to be of type MyEnum.

Currently, type checkers work around this limitation by adding ad hoc extensions for important builtins and standard library functions. For example, mypy comes bundled with a plugin that attempts to infer more precise types for open(...). While this approach works for standard library functions, it’s unsustainable in general: it’s not reasonable to expect 3rd party library authors to maintain plugins for N different type checkers.

We propose adding Literal types to address these gaps.

Core Semantics

This section outlines the baseline behavior of literal types.

Core behavior

Literal types indicate that a variable has a specific and concrete value. For example, if we define some variable foo to have type Literal[3], we are declaring that foo must be exactly equal to 3 and no other value.

Given some value v that is a member of type T, the type Literal[v] shall be treated as a subtype of T. For example, Literal[3] is a subtype of int.

All methods from the parent type will be directly inherited by the literal type. So, if we have some variable foo of type Literal[3] it’s safe to do things like foo + 5 since foo inherits int’s __add__ method. The resulting type of foo + 5 is int.

This “inheriting” behavior is identical to how we handle NewTypes.

Equivalence of two Literals

Two types Literal[v1] and Literal[v2] are equivalent when both of the following conditions are true:

  1. type(v1) == type(v2)
  2. v1 == v2

For example, Literal[20] and Literal[0x14] are equivalent. However, Literal[0] and Literal[False] is not equivalent despite that 0 == False evaluates to ‘true’ at runtime: 0 has type int and False has type bool.

Shortening unions of literals

Literals are parameterized with one or more values. When a Literal is parameterized with more than one value, it’s treated as exactly equivalent to the union of those types. That is, Literal[v1, v2, v3] is equivalent to Union[Literal[v1], Literal[v2], Literal[v3]].

This shortcut helps make writing signatures for functions that accept many different literals more ergonomic — for example, functions like open(...):

# Note: this is a simplification of the true type signature.
_PathType = Union[str, bytes, int]

@overload
def open(path: _PathType,
         mode: Literal["r", "w", "a", "x", "r+", "w+", "a+", "x+"],
         ) -> IO[Text]: ...
@overload
def open(path: _PathType,
         mode: Literal["rb", "wb", "ab", "xb", "r+b", "w+b", "a+b", "x+b"],
         ) -> IO[bytes]: ...

# Fallback overload for when the user isn't using literal types
@overload
def open(path: _PathType, mode: str) -> IO[Any]: ...

The provided values do not all have to be members of the same type. For example, Literal[42, "foo", True] is a legal type.

However, Literal must be parameterized with at least one type. Types like Literal[] or Literal are illegal.

Type inference

This section describes a few rules regarding type inference and literals, along with some examples.

Backwards compatibility

When type checkers add support for Literal, it’s important they do so in a way that maximizes backwards-compatibility. Type checkers should ensure that code that used to type check continues to do so after support for Literal is added on a best-effort basis.

This is particularly important when performing type inference. For example, given the statement x = "blue", should the inferred type of x be str or Literal["blue"]?

One naive strategy would be to always assume expressions are intended to be Literal types. So, x would always have an inferred type of Literal["blue"] in the example above. This naive strategy is almost certainly too disruptive – it would cause programs like the following to start failing when they previously did not:

# If a type checker infers 'var' has type Literal[3]
# and my_list has type List[Literal[3]]...
var = 3
my_list = [var]

# ...this call would be a type-error.
my_list.append(4)

Another example of when this strategy would fail is when setting fields in objects:

class MyObject:
    def __init__(self) -> None:
        # If a type checker infers MyObject.field has type Literal[3]...
        self.field = 3

m = MyObject()

# ...this assignment would no longer type check
m.field = 4

An alternative strategy that does maintain compatibility in every case would be to always assume expressions are not Literal types unless they are explicitly annotated otherwise. A type checker using this strategy would always infer that x is of type str in the first example above.

This is not the only viable strategy: type checkers should feel free to experiment with more sophisticated inference techniques. This PEP does not mandate any particular strategy; it only emphasizes the importance of backwards compatibility.

Using non-Literals in Literal contexts

Literal types follow the existing rules regarding subtyping with no additional special-casing. For example, programs like the following are type safe:

def expects_str(x: str) -> None: ...
var: Literal["foo"] = "foo"

# Legal: Literal["foo"] is a subtype of str
expects_str(var)

This also means non-Literal expressions in general should not automatically be cast to Literal. For example:

def expects_literal(x: Literal["foo"]) -> None: ...

def runner(my_str: str) -> None:
    # ILLEGAL: str is not a subclass of Literal["foo"]
    expects_literal(my_str)

Note: If the user wants their API to support accepting both literals and the original type – perhaps for legacy purposes – they should implement a fallback overload. See Interactions with overloads.

Interactions with other types and features

This section discusses how Literal types interact with other existing types.

Intelligent indexing of structured data

Literals can be used to “intelligently index” into structured types like tuples, NamedTuple, and classes. (Note: this is not an exhaustive list).

For example, type checkers should infer the correct value type when indexing into a tuple using an int key that corresponds a valid index:

a: Literal[0] = 0
b: Literal[5] = 5

some_tuple: Tuple[int, str, List[bool]] = (3, "abc", [True, False])
reveal_type(some_tuple[a])   # Revealed type is 'int'
some_tuple[b]                # Error: 5 is not a valid index into the tuple

We expect similar behavior when using functions like getattr:

class Test:
    def __init__(self, param: int) -> None:
        self.myfield = param

    def mymethod(self, val: int) -> str: ...

a: Literal["myfield"]  = "myfield"
b: Literal["mymethod"] = "mymethod"
c: Literal["blah"]     = "blah"

t = Test()
reveal_type(getattr(t, a))  # Revealed type is 'int'
reveal_type(getattr(t, b))  # Revealed type is 'Callable[[int], str]'
getattr(t, c)               # Error: No attribute named 'blah' in Test

Note: See Interactions with Final for a proposal on how we can express the variable declarations above in a more compact manner.

Interactions with overloads

Literal types and overloads do not need to interact in a special way: the existing rules work fine.

However, one important use case type checkers must take care to support is the ability to use a fallback when the user is not using literal types. For example, consider open:

_PathType = Union[str, bytes, int]

@overload
def open(path: _PathType,
         mode: Literal["r", "w", "a", "x", "r+", "w+", "a+", "x+"],
         ) -> IO[Text]: ...
@overload
def open(path: _PathType,
         mode: Literal["rb", "wb", "ab", "xb", "r+b", "w+b", "a+b", "x+b"],
         ) -> IO[bytes]: ...

# Fallback overload for when the user isn't using literal types
@overload
def open(path: _PathType, mode: str) -> IO[Any]: ...

If we were to change the signature of open to use just the first two overloads, we would break any code that does not pass in a literal string expression. For example, code like this would be broken:

mode: str = pick_file_mode(...)
with open(path, mode) as f:
    # f should continue to be of type IO[Any] here

A little more broadly: we propose adding a policy to typeshed that mandates that whenever we add literal types to some existing API, we also always include a fallback overload to maintain backwards-compatibility.

Interactions with generics

Types like Literal[3] are meant to be just plain old subclasses of int. This means you can use types like Literal[3] anywhere you could use normal types, such as with generics.

This means that it is legal to parameterize generic functions or classes using Literal types:

A = TypeVar('A', bound=int)
B = TypeVar('B', bound=int)
C = TypeVar('C', bound=int)

# A simplified definition for Matrix[row, column]
class Matrix(Generic[A, B]):
    def __add__(self, other: Matrix[A, B]) -> Matrix[A, B]: ...
    def __matmul__(self, other: Matrix[B, C]) -> Matrix[A, C]: ...
    def transpose(self) -> Matrix[B, A]: ...

foo: Matrix[Literal[2], Literal[3]] = Matrix(...)
bar: Matrix[Literal[3], Literal[7]] = Matrix(...)

baz = foo @ bar
reveal_type(baz)  # Revealed type is 'Matrix[Literal[2], Literal[7]]'

Similarly, it is legal to construct TypeVars with value restrictions or bounds involving Literal types:

T = TypeVar('T', Literal["a"], Literal["b"], Literal["c"])
S = TypeVar('S', bound=Literal["foo"])

…although it is unclear when it would ever be useful to construct a TypeVar with a Literal upper bound. For example, the S TypeVar in the above example is essentially pointless: we can get equivalent behavior by using S = Literal["foo"] instead.

Note: Literal types and generics deliberately interact in only very basic and limited ways. In particular, libraries that want to type check code containing a heavy amount of numeric or numpy-style manipulation will almost certainly likely find Literal types as proposed in this PEP to be insufficient for their needs.

We considered several different proposals for fixing this, but ultimately decided to defer the problem of integer generics to a later date. See Rejected or out-of-scope ideas for more details.

Interactions with enums and exhaustiveness checks

Type checkers should be capable of performing exhaustiveness checks when working Literal types that have a closed number of variants, such as enums. For example, the type checker should be capable of inferring that the final else statement must be of type str, since all three values of the Status enum have already been exhausted:

class Status(Enum):
    SUCCESS = 0
    INVALID_DATA = 1
    FATAL_ERROR = 2

def parse_status(s: Union[str, Status]) -> None:
    if s is Status.SUCCESS:
        print("Success!")
    elif s is Status.INVALID_DATA:
        print("The given data is invalid because...")
    elif s is Status.FATAL_ERROR:
        print("Unexpected fatal error...")
    else:
        # 's' must be of type 'str' since all other options are exhausted
        print("Got custom status: " + s)

The interaction described above is not new: it’s already codified within PEP 484. However, many type checkers (such as mypy) do not yet implement this due to the expected complexity of the implementation work.

Some of this complexity will be alleviated once Literal types are introduced: rather than entirely special-casing enums, we can instead treat them as being approximately equivalent to the union of their values and take advantage of any existing logic regarding unions, exhaustibility, type narrowing, reachability, and so forth the type checker might have already implemented.

So here, the Status enum could be treated as being approximately equivalent to Literal[Status.SUCCESS, Status.INVALID_DATA, Status.FATAL_ERROR] and the type of s narrowed accordingly.

Interactions with narrowing

Type checkers may optionally perform additional analysis for both enum and non-enum Literal types beyond what is described in the section above.

For example, it may be useful to perform narrowing based on things like containment or equality checks:

def parse_status(status: str) -> None:
    if status in ("MALFORMED", "ABORTED"):
        # Type checker could narrow 'status' to type
        # Literal["MALFORMED", "ABORTED"] here.
        return expects_bad_status(status)

    # Similarly, type checker could narrow 'status' to Literal["PENDING"]
    if status == "PENDING":
        expects_pending_status(status)

It may also be useful to perform narrowing taking into account expressions involving Literal bools. For example, we can combine Literal[True], Literal[False], and overloads to construct “custom type guards”:

@overload
def is_int_like(x: Union[int, List[int]]) -> Literal[True]: ...
@overload
def is_int_like(x: object) -> bool: ...
def is_int_like(x): ...

vector: List[int] = [1, 2, 3]
if is_int_like(vector):
    vector.append(3)
else:
    vector.append("bad")   # This branch is inferred to be unreachable

scalar: Union[int, str]
if is_int_like(scalar):
    scalar += 3      # Type checks: type of 'scalar' is narrowed to 'int'
else:
    scalar += "foo"  # Type checks: type of 'scalar' is narrowed to 'str'

Interactions with Final

PEP 591 proposes adding a “Final” qualifier to the typing ecosystem. This qualifier can be used to declare that some variable or attribute cannot be reassigned:

foo: Final = 3
foo = 4           # Error: 'foo' is declared to be Final

Note that in the example above, we know that foo will always be equal to exactly 3. A type checker can use this information to deduce that foo is valid to use in any context that expects a Literal[3]:

def expects_three(x: Literal[3]) -> None: ...

expects_three(foo)  # Type checks, since 'foo' is Final and equal to 3

The Final qualifier serves as a shorthand for declaring that a variable is effectively Literal.

If both this PEP and PEP 591 are accepted, type checkers are expected to support this shortcut. Specifically, given a variable or attribute assignment of the form var: Final = value where value is a valid parameter for Literal[...], type checkers should understand that var may be used in any context that expects a Literal[value].

Type checkers are not obligated to understand any other uses of Final. For example, whether or not the following program type checks is left unspecified:

# Note: The assignment does not exactly match the form 'var: Final = value'.
bar1: Final[int] = 3
expects_three(bar1)  # May or may not be accepted by type checkers

# Note: "Literal[1 + 2]" is not a legal type.
bar2: Final = 1 + 2
expects_three(bar2)  # May or may not be accepted by type checkers

Rejected or out-of-scope ideas

This section outlines some potential features that are explicitly out-of-scope.

True dependent types/integer generics

This proposal is essentially describing adding a very simplified dependent type system to the PEP 484 ecosystem. One obvious extension would be to implement a full-fledged dependent type system that lets users predicate types based on their values in arbitrary ways. That would let us write signatures like the below:

# A vector has length 'n', containing elements of type 'T'
class Vector(Generic[N, T]): ...

# The type checker will statically verify our function genuinely does
# construct a vector that is equal in length to "len(vec1) + len(vec2)"
# and will throw an error if it does not.
def concat(vec1: Vector[A, T], vec2: Vector[B, T]) -> Vector[A + B, T]:
    # ...snip...

At the very least, it would be useful to add some form of integer generics.

Although such a type system would certainly be useful, it’s out of scope for this PEP: it would require a far more substantial amount of implementation work, discussion, and research to complete compared to the current proposal.

It’s entirely possible we’ll circle back and revisit this topic in the future: we very likely will need some form of dependent typing along with other extensions like variadic generics to support popular libraries like numpy.

This PEP should be seen as a stepping stone towards this goal, rather than an attempt at providing a comprehensive solution.

Adding more concise syntax

One objection to this PEP is that having to explicitly write Literal[...] feels verbose. For example, instead of writing:

def foobar(arg1: Literal[1], arg2: Literal[True]) -> None:
    pass

…it would be nice to instead write:

def foobar(arg1: 1, arg2: True) -> None:
    pass

Unfortunately, these abbreviations simply will not work with the existing implementation of typing at runtime. For example, the following snippet crashes when run using Python 3.7:

from typing import Tuple

# Supposed to accept tuple containing the literals 1 and 2
def foo(x: Tuple[1, 2]) -> None:
    pass

Running this yields the following exception:

TypeError: Tuple[t0, t1, ...]: each t must be a type. Got 1.

We don’t want users to have to memorize exactly when it’s ok to elide Literal, so we require Literal to always be present.

A little more broadly, we feel overhauling the syntax of types in Python is not within the scope of this PEP: it would be best to have that discussion in a separate PEP, instead of attaching it to this one. So, this PEP deliberately does not try and innovate Python’s type syntax.

Backporting the Literal type

Once this PEP is accepted, the Literal type will need to be backported for Python versions that come bundled with older versions of the typing module. We plan to do this by adding Literal to the typing_extensions 3rd party module, which contains a variety of other backported types.

Implementation

The mypy type checker currently has implemented a large subset of the behavior described in this spec, with the exception of enum Literals and some of the more complex narrowing interactions described above.

Acknowledgements

Thanks to Mark Mendoza, Ran Benita, Rebecca Chen, and the other members of typing-sig for their comments on this PEP.

Additional thanks to the various participants in the mypy and typing issue trackers, who helped provide a lot of the motivation and reasoning behind this PEP.


Source: https://github.com/python-discord/peps/blob/main/pep-0586.rst

Last modified: 2022-02-27 22:46:36 GMT