Python Enhancement Proposals

PEP 639 – Improving License Clarity with Better Package Metadata

PEP
639
Title
Improving License Clarity with Better Package Metadata
Author
Philippe Ombredanne <pombredanne at nexb.com>, C.A.M. Gerlach <CAM.Gerlach at Gerlach.CAM>
Sponsor
Paul Moore <p.f.moore at gmail.com>
PEP-Delegate
Brett Cannon <brett at python.org>
Discussions-To
https://discuss.python.org/t/12622
Status
Draft
Type
Standards Track
Created
15-Aug-2019
Post-History
15-Aug-2019, 17-Dec-2021
Resolution


Contents

Abstract

This PEP defines a specification for how licenses are documented in the core metadata, with license expression strings using SPDX identifiers in a new License-Expression field. This will make license declarations simpler and less ambiguous for package authors to create, end users to read and understand, and tools to programmatically process.

The PEP also:

The changes in this PEP will update the core metadata to version 2.3, modify the PEP 621 project metadata specification, and make minor additions to the source distribution (sdist), built distribution (wheel) and installed project standards.

Goals

This PEP’s scope is limited to covering new mechanisms for documenting the license of a distribution package, specifically defining:

  • A means of specifying a SPDX license expression.
  • A method of including license texts in distributions and installed projects.

The changes to the core metadata specification that this PEP requires have been designed to minimize impact and maximize backward compatibility. This specification builds off of existing ways to document licenses that are already in use in popular tools (e.g. adding support to core metadata for the License-File field already used in the Wheel and Setuptools projects) and by some package authors (e.g. storing an SPDX license expression in the existing License field).

In addition to these proposed changes, this PEP contains guidance for tools handling and converting these metadata, a tutorial for package authors covering various common use cases, detailed examples of them in use, and a comprehensive survey of license documentation in Python and other languages.

It is the intent of the PEP authors to work closely with tool maintainers to implement the recommendations for validation and warnings specified here.

Non-Goals

This PEP is neutral regarding the choice of license by any particular package author. This PEP makes no recommendation for specific licenses, and does not require the use of a particular license documentation convention.

Rather, the SPDX license expression syntax proposed in this PEP provides a simpler and more expressive mechanism to accurately document any kind of license that applies to a Python package, whether it is open source, free/libre, proprietary, or a combination of such.

This PEP also does not impose any additional restrictions when uploading to PyPI, unless projects choose to make use of the new fields.

Instead, it is intended to document best practices already in use, extend them to use a new formally-specified and supported mechanism, and provide guidance for packaging tools on how to hand the transition and inform users accordingly.

This PEP also is not about license documentation in files inside projects, though this is a surveyed topic in the appendix, and nor does it intend to cover cases where the source and binary distribution packages don’t have the same licenses.

Motivation

All software is licensed, and providing accurate license information to Python package users is an important matter. Today, there are multiple fields where licenses are documented in core metadata, and there are limitations to what can be expressed in each of them. This often leads to confusion and a lack of clarity, both for package authors and end users.

Many package authors have expressed difficulty and frustrations due to the limited capabilities to express licensing in project metadata, and this creates further trouble for Linux and BSD distribution re-packagers. This has triggered a number of license-related discussions and issues, including on outdated and ambiguous PyPI classifiers, license interoperability with other ecosystems, too many confusing license metadata options, limited support for license files in the Wheel project, and the lack of clear, precise and standardized license metadata.

On average, Python packages tend to have more ambiguous and missing license information than other common ecosystems (such as npm, Maven or Gem). This is supported by the statistics page of the ClearlyDefined project, an Open Source Initiative incubated effort to help improve licensing clarity of other FOSS projects, covering all packages from PyPI, Maven, npm and Rubygems.

Rationale

A survey of existing license metadata definitions in use in the Python ecosystem today is provided in an appendix of this PEP, and license documentation in a variety of other packaging systems, Linux distros, languages ecosystems and applications is surveyed in another appendix.

There are a few takeaways from the survey:

  • Most package formats use a single License field.
  • Many modern package systems use some form of license expression syntax to optionally combine more than one license identifier together. SPDX and SPDX-like syntaxes are the most popular in use.
  • SPDX license identifiers are becoming the de facto way to reference common licenses everywhere, whether or not a full license expression syntax is used.
  • Several package formats support documenting both a license expression and the paths of the corresponding files that contain the license text. Most Free and Open Source Software licenses require package authors to include their full text in a distribution.

These considerations have guided the design and recommendations of this PEP.

The current license classifiers cover some common cases, and could theoretically be extended to include the full range of current SPDX identifiers while deprecating the many ambiguous classifiers (including some extremely popular and particularly problematic ones, such as License :: OSI Approved :: BSD License). However, this both requires a substantial amount of effort to duplicate the SPDX license list and keep it in sync, and is effectively a hard break in backward compatibility, forcing a huge proportion of package authors to immediately update to new classifiers (in most cases, with many possible choices that require closely examining the project’s license) immediately when PyPI deprecates the old ones.

Furthermore, this only covers simple packages entirely under a single license; it doesn’t address the substantial fraction of common projects that vendor dependencies (e.g. Setuptools), offer a choice of licenses (e.g. Packaging) or were relicensed, adapt code from other projects or contain fonts, images, examples, binaries or other assets under other licenses. It also requires both authors and tools understand and implement the PyPI-specific bespoke classifier system, rather than using short, easy to add and standardized SPDX identifiers in a simple text field, as increasingly widely adopted by most other packaging systems to reduce the overall burden on the ecosystem. Finally, this does not provide as clear an indicator that a package has adopted the new system, and should be treated accordingly.

The use of a new License-Expression field will provide an intuitive, structured and unambiguous way to express the license of a package using a well-defined syntax and well-known license identifiers. Similarly, a formally-specified License-File field offers a standardized way to ensure that the full text of the license(s) are included with the package when distributed, as legally required, and allows other tools consuming the core metadata to unambiguously locate a distribution’s license files.

Over time, encouraging the use of these fields and deprecating the ambiguous, duplicative and confusing legacy alternatives will help Python software publishers improve the clarity, accuracy and portability of their licensing practices, to the benefit of package authors, consumers and redistributors alike.

Terminology

This PEP seeks to clearly define the terms it uses, given that some have multiple established meanings (e.g. import vs. distribution package, wheel format vs. Wheel project); are related and often used interchangeably, but have critical distinctions in meaning (e.g. PEP 621 key vs. core metadata field); are existing concepts that don’t have formal terms/definitions (e.g. project/source metadata vs. distribution/built metadata, build vs. publishing tools), or are new concepts introduced here (e.g. license expression/identifier).

This PEP also uses terms defined in the PyPA PyPUG Glossary (specifically built/binary distribution, distribution package, project and source distribution), and by the SPDX Project (license identifier, license expression).

Terms are listed here in their full versions; related words (Rel:) are in parenthesis, including short forms (Short:), sub-terms (Sub:) and common synonyms for the purposes of this PEP (Syn:).

Core Metadata (Syn: Package Metadata, Sub: Distribution Metadata)
The PyPA specification and the set of metadata fields it defines that describe key static attributes of distribution packages and installed projects.

The distribution metadata refers to, more specifically, the concrete form core metadata takes when included inside a distribution archive (PKG-INFO in a sdist and METADATA in a wheel) or installed project (METADATA).

Core Metadata Field (Short: Metadata Field/Field)
A single key-value pair, or sequence of such with the same key, as defined by the core metadata specification. Notably, not a PEP 621 project metadata format key.
Distribution Package (Sub: Package, Distribution Archive)
(See PyPUG) In this PEP, package is used to refer to the abstract concept of a distributable form of a Python project, while distribution more specifically references the physical distribution archive.
License Classifier
A PyPI Trove classifier (as originally defined in PEP 301) which begins with License ::, currently used to indicate a project’s license status by including it as a Classifier in the core metadata.
License Expression (Syn: SPDX Expression)
A string with valid SPDX license expression syntax including any SPDX license identifiers as defined here, which describes a project’s license(s) and how they relate to one another. Examples: GPL-3.0-or-later, MIT AND (Apache-2.0 OR BSD-2-clause)
License Identifier (Syn: License ID/SPDX Identifier)
A valid SPDX short-form license identifier, as described in the Add License-Expression field section of this PEP; briefly, this includes all valid SPDX identifiers and the LicenseRef-Public-Domain and LicenseRef-Proprietary strings. Examples: MIT, GPL-3.0-only
Project (Sub: Project Source Tree, Installed Project)
(See PyPUG) Here, a project source tree refers to the on-disk format of a project used for development, while an installed project is the form a project takes once installed from a distribution, as specified by PyPA.
Project Source Metadata (Sub: PEP 621 Metadata, Key, Subkey)
Core metadata defined by the package author in the project source tree, as top-level keys in the [project] table of a PEP 621 pyproject.toml, in the [metadata] table of setup.cfg, or the equivalent for other build tools.

The PEP 621 metadata refers specifically to the former, as defined by the PyPA Declaring Project Metadata specification. A PEP 621 metadata key, or an unqualified key refers specifically to a top-level [project] key (notably, not a core metadata field), while a subkey refers to a second-level key in a table-valued PEP 621 key.

Root License Directory (Short: License Directory)
The directory under which license files are stored in a project/distribution and the root directory that their paths, as recorded under the License-File core metadata fields, are relative to. Defined here to be the project root directory for source trees and source distributions, and a subdirectory named license_files of the directory containing the core metadata (i.e., the .dist-info/license_files directory) for built distributions and installed projects.
Tool (Sub: Packaging Tool, Build Tool, Install Tool, Publishing Tool)
A program, script or service executed by the user or automatically that seeks to conform to the specification defined in this PEP.

A packaging tool refers to a tool used to build, publish, install, or otherwise directly interact with Python packages.

A build tool is a packaging tool used to generate a source or built distribution from a project source tree or sdist, when directly invoked as such (as opposed to by end-user-facing install tools). Examples: Wheel project, PEP 517 backends via build or other package-developer-facing frontends, calling setup.py directly.

An install tool is a packaging tool used to install a source or built distribution in a target environment. Examples include the PyPA pip and installer projects.

A publishing tool is a packaging tool used to upload distribution archives to a package index, such as Twine for PyPI.

Wheel (Short: wheel, Rel: wheel format, Wheel project)
Here, wheel, the standard built distribution format introduced in PEP 427 and specified by PyPA, will be referred to in lowercase, while the Wheel project, its reference implementation, will be referred to as such with Wheel in Title Case.

Specification

The changes necessary to implement the improved license handling outlined in this PEP include those in both distribution package metadata, as defined in the core metadata specification, and author-provided project source metadata, as originally defined in PEP 621.

Further, minor additions to the source distribution (sdist), built distribution (wheel) and installed project specifications will help document and clarify the already allowed, now formally standardized behavior in these respects. Finally, guidance is established for tools handling and converting legacy license metadata to license expressions, to ensure the results are consistent, correct and unambiguous.

Note that the guidance on errors and warnings is for tools’ default behavior; they MAY operate more strictly if users explicitly configure them to do so, such as by a CLI flag or a configuration option.

Core metadata

The PyPA Core Metadata specification defines the names and semantics of each of the supported fields in the distribution metadata of Python distribution packages and installed projects.

This PEP adds the License-Expression field, adds the License-File field, deprecates the License field, and deprecates the license classifiers in the Classifier field.

The error and warning guidance in this section applies to build and publishing tools; end-user-facing install tools MAY be more lenient than mentioned here when encountering malformed metadata that does not conform to this specification.

As it adds new fields, this PEP updates the core metadata to version 2.3.

Add License-Expression field

The License-Expression optional field is specified to contain a text string that is a valid SPDX license expression, as defined herein.

Publishing tools SHOULD issue an informational warning if this field is missing, and MAY raise an error. Build tools MAY issue a similar warning, but MUST NOT raise an error.

A license expression is a string using the SPDX license expression syntax as documented in the SPDX specification, either Version 2.2 or a later compatible version.

When used in the License-Expression field and as a specialization of the SPDX license expression definition, a license expression can use the following license identifiers:

  • Any SPDX-listed license short-form identifiers that are published in the SPDX License List, version 3.15 or any later compatible version. Note that the SPDX working group never removes any license identifiers; instead, they may choose to mark an identifier as “deprecated”.
  • The LicenseRef-Public-Domain and LicenseRef-Proprietary strings, to identify licenses that are not included in the SPDX license list.

When processing the License-Expression field to determine if it contains a valid license expression, build and publishing tools:

  • SHOULD halt execution and raise an error if:
    • The field does not contain a valid license expression
    • One or more license identifiers are not valid (as defined above)
  • SHOULD report an informational warning, and publishing tools MAY raise an error, if one or more license identifiers have been marked as deprecated in the SPDX License List.
  • MUST store a case-normalized version of the License-Expression field using the reference case for each SPDX license identifier and uppercase for the AND, OR and WITH keywords.
  • SHOULD report an informational warning, and MAY raise an error if the normalization process results in changes to the License-Expression field contents.

For all newly-upload distributions that include a License-Expression field, the Python Package Index (PyPI) MUST validate that it contains a valid, case-normalized license expression with valid identifiers (as defined here) and MUST reject uploads that do not. PyPI MAY reject an upload for using a deprecated license identifier, so long as it was deprecated as of the above-mentioned SPDX License List version.

Add License-File field

Each instance of the License-File optional field is specified to contain the string representation of the path in the project source tree, relative to the project root directory, of a license-related file. It is a multi-use field that may appear zero or more times, each instance listing the path to one such file. Files specified under this field could include license text, author/attribution information, or other legal notices that need to be distributed with the package.

As specified by this PEP, its value is also that file’s path relative to the root license directory in both installed projects and the standardized distribution package types. In other legacy, non-standard or new distribution package formats and mechanisms of accessing and storing core metadata, the value MAY correspond to the license file path relative to a format-defined root license directory. Alternatively, it MAY be treated as a unique abstract key to access the license file contents by another means, as specified by the format.

If a License-File is listed in a source or built distribution’s core metadata, that file MUST be included in the distribution at the specified path relative to the root license directory, and MUST be installed with the distribution at that same relative path.

The specified relative path MUST be consistent between project source trees, source distributions (sdists), built distributions (wheels) and installed projects. Therefore, inside the root license directory, packaging tools MUST reproduce the directory structure under which the source license files are located relative to the project root.

Path delimiters MUST be the forward slash character (/), and parent directory indicators (..) MUST NOT be used. License file content MUST be UTF-8 encoded text.

Build tools MAY and publishing tools SHOULD produce an informative warning if a built distribution’s metadata contains no License-File entries, and publishing tools MAY but build tools MUST NOT raise an error.

For all newly-uploaded distribution packages that include one or more License-File fields and declare a Metadata-Version of 2.3 or higher, PyPI SHOULD validate that the specified files are present in all uploaded distributions, and MUST reject uploads that do not validate.

Deprecate License field

The legacy unstructured-text License field is deprecated and replaced by the new License-Expression field. Build and publishing tools MUST raise an error if both these fields are present and their values are not identical, including capitalization and excluding leading and trailing whitespace.

If only the License field is present, such tools SHOULD issue a warning informing users it is deprecated and recommending License-Expression instead.

For all newly-uploaded distributions that include a License-Expression field, the Python Package Index (PyPI) MUST reject any that specify a License field and the text of which is not identical to that of License-Expression, as defined in this section.

Along with license classifiers, the License field may be removed from a new version of the specification in a future PEP.

Deprecate license classifiers

Using license classifiers in the Classifier field (described in PEP 301) is deprecated and replaced by the more precise License-Expression field.

If the License-Expression field is present, build tools SHOULD and publishing tools MUST raise an error if one or more license classifiers is included in a Classifier field, and MUST NOT add such classifiers themselves.

Otherwise, if this field contains a license classifier, build tools MAY and publishing tools SHOULD issue a warning informing users such classifiers are deprecated, and recommending License-Expression instead. For compatibility with existing publishing and installation processes, the presence of license classifiers SHOULD NOT raise an error unless License-Expression is also provided.

For all newly-uploaded distributions that include a License-Expression field, the Python Package Index (PyPI) MUST reject any that also specify any license classifiers.

New license classifiers MUST NOT be added to PyPI; users needing them SHOULD use the License-Expression field instead. Along with the License field, license classifiers may be removed from a new version of the specification in a future PEP.

Project source metadata

As originally introduced in PEP 621, the PyPA Declaring Project Metadata specification defines how to declare a project’s source metadata in a [project] table in the pyproject.toml file for build tools to consume and output distribution core metadata.

This PEP adds the license-expression key, adds the license-files key and deprecates the license key.

Add license-expression key

A new license-expression key is added to the project table, which has a string value that is a valid SPDX license expression, as defined previously. Its value maps to the License-Expression field in the core metadata.

Build tools SHOULD validate the expression as described above, outputting an error or warning as specified. When generating the core metadata, tools MUST perform case normalization.

If and only if the license-expression key is listed as dynamic (and is not specified), tools MAY infer a value for the License-Expression field if they can do so unambiguously, but MUST follow the provisions in the Converting legacy metadata section.

If the license-expression key is present and valid (and the license key is not specified), for purposes of backward compatibility, tools MAY back-fill the License core metadata field with the case-normalized value of the license-expression key.

Add license-files key

A new license-files key is added to the project table for specifying paths in the project source tree relative to pyproject.toml to file(s) containing licenses and other legal notices to be distributed with the package. It corresponds to the License-File fields in the core metadata.

Its value is a table, which if present MUST contain one of two optional, mutually exclusive subkeys, paths and globs; if both are specified, tools MUST raise an error. Both are arrays of strings; the paths subkey contains verbatim file paths, and the globs subkey valid glob patterns, which MUST be parsable by the glob module in the Python standard library.

Note: To avoid ambiguity, confusion and (per PEP 20, the Zen of Python) “more than one (obvious) way to do it”, allowing a flat array of strings as the value for the license-files key has been left out for now.

Path delimiters MUST be the forward slash character (/), and parent directory indicators (..) MUST NOT be used. Tools MUST assume that license file content is valid UTF-8 encoded text, and SHOULD validate this and raise an error if it is not.

If the paths subkey is a non-empty array, build tools:

  • MUST treat each value as a verbatim, literal file path, and MUST NOT treat them as glob patterns.
  • MUST include each listed file in all distribution archives.
  • MUST NOT match any additional license files beyond those explicitly statically specified by the user under the paths subkey.
  • MUST list each file path under a License-File field in the core metadata.
  • MUST raise an error if one or more paths do not correspond to a valid file in the project source that can be copied into the distribution archive.

If the globs subkey is a non-empty array, build tools:

  • MUST treat each value as a glob pattern, and MUST raise an error if the pattern contains invalid glob syntax.
  • MUST include all files matched by at least one listed pattern in all distribution archives.
  • MAY exclude files matched by glob patterns that can be unambiguously determined to be backup, temporary, hidden, OS-generated or VCS-ignored.
  • MUST list each matched file path under a License-File field in the core metadata.
  • SHOULD issue a warning and MAY raise an error if no files are matched.
  • MAY issue a warning if any individual user-specified pattern does not match at least one file.

If the license-files key is present, and the paths or globs subkey is set to a value of an empty array, then tools MUST NOT include any license files and MUST NOT raise an error.

If the license-files key is not present and not explicitly marked as dynamic, tools MUST assume a default value of the following:

license-files.globs = ["LICEN[CS]E*", "COPYING*", "NOTICE*", "AUTHORS*"]

In this case, tools MAY issue a warning if no license files are matched, but MUST NOT raise an error.

If the license-files key is marked as dynamic (and not present), to preserve consistent behavior with current tools and help ensure the packages they create are legally distributable, build tools SHOULD default to including at least the license files matching the above patterns, unless the user has explicitly specified their own.

Deprecate license key

The license key in the project table is now deprecated. It MUST NOT be used or listed as dynamic if either of the new license-expression or license-files keys are defined, and build tools MUST raise an error if either is the case.

Otherwise, if the text subkey is present in the license table, tools SHOULD issue a warning informing users it is deprecated and recommending the license-expression key instead.

Likewise, if the file subkey is present in the license table, tools SHOULD issue a warning informing users it is deprecated and recommending the license-files key instead. However, if the file is present in the source, build tools SHOULD still use it to fill the License-File field in the core metadata, and if so, MUST include the specified file in any distribution archives for the project. If the file does not exist at the specified path, tools SHOULD issue a warning, and MUST NOT fill it in a License-File field.

For backwards compatibility, to preserve consistent behavior with current tools and ensure that users do not unknowingly create packages that are not legally distributable, tools MUST assume the above default value for the license-files key and also include, in addition to the license file specified under this file subkey, any license files that match the specified list of patterns.

The license key may be removed from a new version of the specification in a future PEP.

License files in project formats

A few minor additions will be made to the relevant existing specifications to document, standardize and clarify what is already currently supported, allowed and implemented behavior, as well as explicitly mention the root license directory the license files are located in and relative to for each format, per the specification above.

Project source trees
As described above, the Declaring Project Metadata specification will be updated to reflect that license file paths MUST be relative to the project root directory; i.e. the directory containing the pyproject.toml (or equivalently, other legacy project configuration, e.g. setup.py, setup.cfg, etc).
Source distributions (sdists)
The sdist specification will be updated to reflect that for Metadata-Version is 2.3 or greater, the sdist MUST contain any license files specified by License-File in the PKG-INFO at their respective paths relative to the top-level directory of the sdist (containing the pyproject.toml and the PKG-INFO core metadata).
Built distributions (wheels)
The wheel specification will be updated to reflect that if the Metadata-Version is 2.3 or greater and one or more License-File fields is specified, the .dist-info directory MUST contain a license_files subdirectory which MUST contain the files listed in the License-File fields in the METADATA file at their respective paths relative to the license_files directory.
Installed projects
The Recording Installed Projects specification will be updated to reflect that if the Metadata-Version is 2.3 or greater and one or more License-File fields is specified, the .dist-info directory MUST contain a license_files subdirectory which MUST contain the files listed in the License-File fields in the METADATA file at their respective paths relative to the license_files directory, and that any files in this directory MUST be copied from wheels by install tools.

Converting legacy metadata

If the contents of the license.text PEP 621 source metadata key (or equivalent for tool-specific config formats) is a valid license expression containing solely known, non-deprecated license identifiers, and, if PEP 621 metadata are defined, the license-expression key is listed as dynamic, build tools MAY use it to fill the License-Expression field.

Similarly, if the classifiers PEP 621 source metadata key (or equivalent for tool-specific config formats) contains exactly one license classifier that unambiguously maps to exactly one valid, non-deprecated SPDX license identifier, tools MAY fill the License-Expression field with the latter.

If both a license.text or equivalent value and a single license classifier are present, the contents of the former, including capitalization (but excluding leading and trailing whitespace), MUST exactly match the SPDX license identifier mapped to the license classifier to be considered unambiguous for the purposes of automatically filling the License-Expression field.

If tools have filled the License-Expression field as described here, they MUST output a prominent, user-visible warning informing package authors of that fact, including the License-Expression string they have output, and recommending that the project source metadata be updated accordingly with the indicated license expression.

In any other case, tools MUST NOT use the contents of the license.text key (or equivalent) or license classifiers to fill the License-Expression field without informing the user and requiring unambiguous, affirmative user action to select and confirm the desired License-Expression value before proceeding.

Mapping license classifiers to SPDX identifiers

Most single license classifiers (namely, all those not mentioned below) map to a single valid SPDX license identifier, allowing tools to insert them into the License-Expression field following the specification above.

Many legacy license classifiers intend to specify a particular license, but do not specify the particular version or variant, leading to a critical ambiguity as to their terms, compatibility and acceptability. Tools MUST NOT attempt to automatically infer a License-Expression when one of these classifiers is used, and SHOULD instead prompt the user to affirmatively select and confirm their intended license choice.

These classifiers are the following:

  • License :: OSI Approved :: Academic Free License (AFL)
  • License :: OSI Approved :: Apache Software License
  • License :: OSI Approved :: Apple Public Source License
  • License :: OSI Approved :: Artistic License
  • License :: OSI Approved :: BSD License
  • License :: OSI Approved :: GNU Affero General Public License v3
  • License :: OSI Approved :: GNU Free Documentation License (FDL)
  • License :: OSI Approved :: GNU General Public License (GPL)
  • License :: OSI Approved :: GNU General Public License v2 (GPLv2)
  • License :: OSI Approved :: GNU General Public License v3 (GPLv3)
  • License :: OSI Approved :: GNU Lesser General Public License v2 (LGPLv2)
  • License :: OSI Approved :: GNU Lesser General Public License v2 or later (LGPLv2+)
  • License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)
  • License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)

A comprehensive mapping of these classifiers to their possible specific identifiers was assembled by Dustin Ingram, which tools MAY use as a reference for the identifier selection options to offer users when prompting the user to explicitly select the license identifier they intended for their project.

Note: Several additional classifiers, namely the “or later” variants of the AGPLv3, GPLv2, GPLv3 and LGPLv3, are also listed in the aforementioned mapping, but as they were merely proposed for textual harmonization and still unambiguously map to their respective licenses, they were not included here; LGPLv2 is, however, as it could ambiguously refer to either the distinct v2.0 or v2.1 variants of that license.

In addition, for the various special cases, the following mappings are considered canonical and normative for the purposes of this specification:

  • Classifier License :: Public Domain MAY be mapped to the generic License-Expression: LicenseRef-Public-Domain. If tools do so, they SHOULD issue an informational warning encouraging the use of more explicit and legally portable license identifiers, such as those for the CC0 1.0 license (CC0-1.0), the Unlicense (Unlicense), or the MIT license (MIT), since the meaning associated with the term “public domain” is thoroughly dependent on the specific legal jurisdiction involved, some of which lack the concept entirely. Alternatively, tools MAY choose to treat these classifiers as ambiguous and require user confirmation to fill License-Expression in these cases.
  • The generic and sometimes ambiguous classifiers License :: Free For Educational Use, License :: Free For Home Use, License :: Free for non-commercial use, License :: Freely Distributable, License :: Free To Use But Restricted, License :: Freeware, and License :: Other/Proprietary License MAY be mapped to the generic License-Expression: LicenseRef-Proprietary, but tools MUST issue a prominent, informative warning if they do so. Alternatively, tools MAY choose to treat these classifiers as ambiguous and require user confirmation to fill License-Expression in these cases.
  • The generic and ambiguous classifiers License :: OSI Approved and License :: DFSG approved do not map to any license expression, and thus tools MUST treat them as ambiguous and require user intervention to fill License-Expression.
  • The classifiers License :: GUST Font License 1.0 and License :: GUST Font License 2006-09-30 have no mapping to SPDX license identifiers and no PyPI package uses them, as of the writing of this PEP. Therefore, tools MUST treat them as ambiguous when attempting to fill License-Expression.

When multiple license classifiers are used, their relationship is ambiguous, and it is typically not possible to determine if all the licenses apply or if there is a choice that is possible among the licenses. In this case, tools MUST NOT automatically infer a license expression, and SHOULD suggest that the package author construct one which expresses their intent.

Backwards Compatibility

Adding a new, dedicated License-Expression core metadata field and license-expression PEP 621 source metadata key unambiguously signals support for the specification in this PEP. This avoids the risk of new tooling misinterpreting a license expression as a free-form license description or vice versa, and raises an error if and only if the user affirmatively upgrades to the latest metadata version and adds the new field/key.

The legacy License core metadata field and license PEP 621 source metadata key will be deprecated along with the license classifiers, retaining backwards compatibility while gently preparing users for their future removal. Such a removal would follow a suitable transition period, and be left to a future PEP and a new version of the core metadata specification.

Formally specifying the new License-File core metadata field and the inclusion of the listed files in the distribution merely codifies and refines the existing practices in popular packaging tools, including the Wheel and Setuptools projects, and is designed to be largely backwards-compatible with their existing use of that field. Likewise, the new license-files PEP 621 source metadata key standardizes statically specifying the files to include, as well as the default behavior, and allows other tools to make use of them, while only having an effect once users and tools expressly adopt it.

Due to requiring license files not be flattened into .dist-info and specifying that they should be placed in a dedicated license_files subdir, wheels produced following this change will have differently-located licenses relative to those produced via the previous unspecified, installer-specific behavior, but as until this PEP there was no way of discovering these files or accessing them programmatically, and this will be further discriminated by a new metadata version, there aren’t any foreseen mechanism for this to pose a practical issue.

Furthermore, this resolves existing compatibility issues with the current ad hoc behavior, namely license files being silently clobbered if they have the same names as others at different paths, unknowingly rendering the wheel undistributable, and conflicting with the names of other metadata files in the same directory. Formally specifying otherwise would in fact block full forward compatibility with additional standard or installer-specified files and directories added to .dist-info, as they too could conflict with the names of existing licenses.

While minor additions will be made to the source distribution (sdist), built distribution (wheel) and installed project specifications, all of these are merely documenting, clarifying and formally specifying behaviors explicitly allowed under their current respective specifications, and already implemented in practice, and gating them behind the explicit presence of both the new metadata versions and the new fields. In particular, sdists may contain arbitrary files following the project source tree layout, and formally mentioning that these must include the license files listed in the metadata merely documents and codifies existing Setuptools practice. Likewise, arbitrary installer-specific files are allowed in the .dist-info directory of wheels and copied to installed projects, and again this PEP just formally clarifies and standardizes what is already being done.

Finally, while this PEP does propose PyPI implement validation of the new License-Expression and License-File fields, this has no effect on existing packages, nor any effect on any new distributions uploaded unless they explicitly choose to opt in to using these new fields while not following the requirements in the specification. Therefore, this does not have a backward compatibility impact, and in fact ensures forward compatibility with any future changes by ensuring all distributions uploaded to PyPI with the new fields are valid and conform to the specification.

Security Implications

This PEP has no foreseen security implications: the License-Expression field is a plain string and the License-File fields are file paths. Neither introduces any known new security concerns.

How to Teach This

The simple cases are simple: a single license identifier is a valid license expression, and a large majority of packages use a single license.

The plan to teach users of packaging tools how to express their package’s license with a valid license expression is to have tools issue informative messages when they detect invalid license expressions, or when the deprecated License field or license classifiers are used.

An immediate, descriptive error message if an invalid License-Expression is used will help users understand they need to use SPDX identifiers in this field, and catch them if they make a mistake. For authors still using the now-deprecated, less precise and more redundant License field or license classifiers, packaging tools will warn them and inform them of the modern replacement, License-Expression. Finally, for users who may have forgotten or not be aware they need to do so, publishing tools will gently guide them toward including license-expression and license-files in their project source metadata.

Tools may also help with the conversion and suggest a license expression in many, if not most common cases:

  • The section Mapping license classifiers to SPDX identifiers provides tool authors with guidelines on how to suggest a license expression produced from legacy classifiers.
  • Tools may also be able to infer and suggest how to update an existing License value and convert that to a License-Expression. For instance, a tool may suggest converting from a License field with Apache2 (which is not a valid license expression as defined in this PEP) to a License-Expression field with Apache-2.0 (which is a valid license expression using an SPDX license identifier).

Reference Implementation

Tools will need to support parsing and validating license expressions in the License-Expression field.

The license-expression library is a reference Python implementation that handles license expressions including parsing, formatting and validation, using flexible lists of license symbols (including SPDX license IDs and any extra identifiers included here). It is licensed under Apache-2.0 and is already used in several projects, including the SPDX Python Tools, the ScanCode toolkit and the Free Software Foundation Europe (FSFE) REUSE project.

Rejected Ideas

Core metadata fields

Potential alternatives to the structure, content and deprecation of the core metadata fields specified in this PEP.

Re-use the License field

Following initial discussion, earlier versions of this PEP proposed re-using the existing License field, which tools would attempt to parse as a SPDX license expression with a fallback to free text. Initially, this would merely cause a warning (or even pass silently), but would eventually be treated as an error by modern tooling.

This offered the potential benefit of greater backwards-compatibility, easing the community into using SPDX license expressions while taking advantage of packages that already have them (either intentionally or coincidentally), and avoided adding yet another license-related field.

However, following substantial discussion, consensus was reached that a dedicated License-Expression field was the preferred overall approach. The presence of this field is an unambiguous signal that a package intends it to be interpreted as a valid SPDX identifier, without the need for complex and potentially erroneous heuristics, and allows tools to easily and unambiguously detect invalid content.

This avoids both false positive (License values that a package author didn’t explicitly intend as an explicit SPDX identifier, but that happen to validate as one), and false negatives (expressions the author intended to be valid SPDX, but due to a typo or mistake are not), which are otherwise not clearly distinguishable from true positives and negatives, an ambiguity at odds with the goals of this PEP.

Furthermore, it allows both the existing License field and the license classifiers to be more easily deprecated, with tools able to cleanly distinguish between packages intending to affirmatively conform to the updated specification in this PEP or not, and adapt their behavior (warnings, errors, etc) accordingly. Otherwise, tools would either have to allow duplicative and potentially conflicting License fields and classifiers, or warn/error on the substantial number of existing packages that have SPDX identifiers as the value for the License field, intentionally or otherwise (e.g. MIT).

Finally, it avoids changing the behavior of an existing metadata field, and avoids tools having to guess the Metadata-Version and field behavior based on its value rather than merely its presence.

While this would mean the subset of existing distributions containing License fields valid as SPDX license expressions wouldn’t automatically be recognized as such, this only requires appending a few characters to the key name in the project’s source metadata, and this PEP provides extensive guidance on how this can be done automatically by tooling.

Given all this, it was decided to proceed with defining a new, purpose-created field, License-Expression.

Re-Use the License field with a value prefix

As an alternative to the above, prefixing SPDX license expressions with, e.g. spdx: was suggested to reduce the ambiguity inherent in re-using the License field. However, this effectively amounted to creating a field within a field, and doesn’t address all the downsides of keeping the License field. Namely, it still changes the behavior of an existing metadata field, requires tools to parse its value to determine how to handle its content, and makes the specification and deprecation process more complex and less clean.

Yet, it still shares a same main potential downside as just creating a new field: projects currently using valid SPDX identifiers in the License field, intentionally or not, won’t be automatically recognized, and requires about the same amount of effort to fix, namely changing a line in the project’s source metadata. Therefore, it was rejected in favor of a new field.

Don’t make License-Expression mutually exclusive

For backwards compatibility, the License field and/or the license classifiers could still be allowed together with the new License-Expression field, presumably with a warning. However, this could easily lead to inconsistent, and at the very least duplicative license metadata in no less than three different fields, which is squarely contrary to the goals of this PEP of making the licensing story simpler and unambiguous. Therefore, and in concert with clear community consensus otherwise, this idea was soundly rejected.

Don’t deprecate existing License field and classifiers

Several community members were initially concerned that deprecating the existing License field and classifiers would result in excessive churn for existing package authors and raise the barrier to entry for new ones, particularly everyday Python developers seeking to package and publish their personal projects without necessarily caring too much about the legal technicalities or being a “license lawyer”. Indeed, every deprecation comes with some non-zero short-term cost, and should be carefully considered relative to the overall long-term net benefit. And at the minimum, this change shouldn’t make it more difficult for the average Python developer to share their work under a license of their choice, and ideally improve the situation.

Following many rounds of proposals, discussion and refinement, the general consensus was clearly in favor of deprecating the legacy means of specifying a license, in favor of “one obvious way to do it”, to improve the currently complex and fragmented story around license documentation. Not doing so would leave three different un-deprecated ways of specifying a license for a package, two of them ambiguous, less than clear/obvious how to use, inconsistently documented and out of date. This is more complex for all tools in the ecosystem to support indefinitely (rather than simply installers supporting older packages implementing previous frozen metadata versions), resulting in a non-trivial and unbounded maintenance cost.

Furthermore, it leads to a more complex and confusing landscape for users with three similar but distinct options to choose from, particularly with older documentation, answers and articles floating around suggesting different ones. Of the three, License-Expression is the simplest and clearest to use correctly; users just paste in their desired license identifier, or select it via a tool, and they’re done; no need to learn about Trove classifiers and dig through the list to figure out which one(s) apply (and be confused by many ambiguous options), or figure out on their own what should go in the license key (anything from nothing, to the license text, to a free-form description, to the same SPDX identifier they would be entering in the license-expression key anyway, assuming they can easily find documentation at all about it). In fact, this can be made even easier thanks to the new field. For example, GitHub’s popular ChooseALicense.com links to how to add SPDX license identifiers to the project source metadata of various languages that support them right in the sidebar of every license page; the SPDX support in this PEP enables adding Python to that list.

For current package maintainers who have specified a License or license classifiers, this PEP only recommends warnings and prohibits errors for all but publishing tools, which are allowed to error if their intended distribution platform(s) so requires. Once maintainers are ready to upgrade, for those already using SPDX license expressions (accidentally or not) this only requires appending a few characters to the key name in the project’s source metadata, and for those with license classifiers that map to a single unambiguous license, or another defined case (public domain, proprietary), they merely need to drop the classifier and paste in the corresponding license identifier. This PEP provides extensive guidance and examples, as will other resources, as well as explicit instructions for automated tooling to take care of this with no human changes needed. More complex cases where license metadata is currently specified may need a bit of human intervention, but in most cases tools will be able to provide a list of options following the mappings in this PEP, and these are typically the projects most likely to be constrained by the limitations of the existing license metadata, and thus most benefited by the new fields in this PEP.

Finally, for unmaintained packages, those using tools supporting older metadata versions, or those who choose not to provide license metadata, no changes are required regardless of the deprecation.

Don’t mandate validating new fields on PyPI

Previously, while this PEP did include normative guidelines for packaging publishing tools (such as Twine), it did not provide specific guidance for PyPI (or other package indices) as to whether and how they should validate the License-Expression or License-File fields, nor how they should handle using them in combination with the deprecated License field or license classifiers. This simplifies the specification and either defers implementation on PyPI to a later PEP, or gives discretion to PyPI to enforce the stated invariants, to minimize disruption to package authors.

However, this had been left unstated from before the License-Expression field was separate from the existing License, which would make validation much more challenging and backwards-incompatible, breaking existing packages. With that change, there was a clear consensus that the new field should be validated from the start, guaranteeing that all distributions uploaded to PyPI that declare core metadata version 2.3 or higher and have the License-Expression field will have a valid expression, such that PyPI and consumers of its packages and metadata can rely upon to follow the specification here.

The same can be extended to the new License-File field as well, to ensure that it is valid and the legally required license files are present, and thus it is lawful for PyPI, users and downstream consumers to distribute the package. (Of course, this makes no guarantee of such as it is ultimately reliant on authors to declare them, but it improves assurance of this and allows doing so in the future if the community so decides.) To be clear, this would not require that any uploaded distribution have such metadata, only that if they choose to declare it per the new specification in this PEP, it is assured to be valid.

Source metadata license key

Alternate possibilities related to the license key in the pyproject.toml project source metadata specified in PEP 621.

Add expression and files subkeys to table

A previous working draft of this PEP added expression and files subkeys to the existing license table in the PEP 621 source metadata, to parallel the existing file and text subkeys. While this seemed perhaps the most obvious approach at first glance, it had several serious drawbacks relative to that ultimately taken here.

Most saliently, this means two very different types of metadata are being specified under the same top-level key that require very different handling, and furthermore, unlike the previous arrangement, the subkeys were not mutually exclusive and can both be specified at once, and with some subkeys potentially being dynamic and others static, and mapping to different core metadata fields. This also breaks from the consensus for the core metadata fields, namely to separate the license expression into its own explicit field.

Furthermore, this leads to a conflict with marking the key as dynamic (assuming that is intended to specify PEP 621 keys, as that PEP seems to rather imprecisely imply, rather than core metadata fields), as either both would have to be treated as dynamic. A user may want to specify the expression key as dynamic, if they intend their tooling to generate it automatically; conversely, they may rely on their build tool to dynamically detect license files via means outside of that strictly specified here. And indeed, current users may mark the present license key as dynamic to automatically fill it in the metadata. Grouping all these uses under the same key forces an “all or nothing” approach, and creates ambiguity as to user intent.

There are further downsides to this as well. Both users and tools would need to keep track of which fields are mutually exclusive with which of the others, greatly increasing cognitive and code complexity, and in turn the probability of errors. Conceptually, juxtaposing so many different fields under the same key is rather jarring, and leads to a much more complex mapping between PEP 621 keys and core metadata fields, not in keeping with PEP 621. This causes the PEP 621 naming and structure to diverge further from both the core metadata and native formats of the various popular packaging tools that use it. Finally, this results in the spec being significantly more complex and convoluted to understand and implement than the alternatives.

The approach this PEP now takes, adding distinct license-expression and license-files keys and simply deprecating the whole license key, avoids all the issues identified above, and results in a much clearer and cleaner design overall. It allows license and license-files to be tagged dynamic independently, separates two independent types of metadata (syntactically and semantically), restores a closer to 1:1 mapping of PEP 621 keys to core metadata fields, and reduces nesting by a level for both. Other than adding two extra keys to the file, there was no significant apparent downside to this latter approach, so it was adopted for this PEP.

Define license expression as string value

A compromise approach between adding two new top-level keys for license expressions and files would be adding a separate license-files key, but re-using the license key for the license expression, either by defining it as the (previously reserved) string value for the license key, retaining the expression subkey in the license table, or allowing both. Indeed, this would seem to have been envisioned by PEP 621 itself with this PEP in mind, in particular the first approach:

A practical string value for the license key has been purposefully left out to allow for a future PEP to specify support for SPDX expressions (the same logic applies to any sort of “type” field specifying what license the file or text represents).

However, while a working draft temporarily explored this solution, it was ultimately rejected, as it shared most of the downsides identified with adding new subkeys under the existing license table, as well as several of its own, with again minimal advantage over separating both.

Most importantly, it still means that per PEP 621, it is not possible to separately mark the [project] keys corresponding to the License and License-Expression metadata fields as dynamic. This, in turn, still renders specifying metadata following that standard incompatible with conversion of legacy metadata, as specified in this PEP’s Converting legacy metadata section, as PEP 621 strictly prohibits the license key from being both present (to define the existing value of the License field, or the path to a license file, and thus able to be converted), and specified as dynamic (which would allow tools to use the generated value for the License-Expression field.

For the same reasons, this would make it impossible to back-fill the License field from the License-Expression field as this PEP currently allows (without making an exception from strict dynamic behavior in this case), as again, marking license as dynamic would mean it cannot be specified in the project table at all.

Furthermore, this would mean existing project source metadata specifying license as dynamic would be ambiguous, as it would be impossible for tools to statically determine if they are intended to conform to previous metadata versions specifying License, or this version specifying License-Expression. Tools would have no way of determining which field, if either, might be filled in the resulting distribution’s core metadata. By contrast, the present approach makes clear what the author intended, allows tools to unambiguously determine which field(s) may be dynamically inserted, and ensures backward compatibility such that current project source metadata do not unknowingly specify both the old and the new field as dynamic, and instead must do so explicitly per PEP 621’s intent.

Additionally, while differences from existing tool formats (and core metadata field names) has precedent in PEP 621 (though is best avoided if practical), using a key with an identical name as in all current tools (and of an existing core metadata field) to mean something different (and map to a different core metadata field), with distinct and incompatible syntax and semantics, does not, and is likely to create substantial and confusion and ambiguity for readers and authors, contrary to the fundamental goals of this PEP.

Finally, this means that the top-level license key still maps to multiple core metadata fields with different purposes and interpretation (License and License-Expression), this would deny a clear separation from the old behavior by not cleanly deprecating the license key, and increases the complexity of the specification and implementation.

In addition to the aforementioned issues, this also requires deciding between the three individual approaches (expression subkey, top-level string or allowing both), all of which have further significant downsides and none of which are clearly superior or more obvious, leading to needless bikeshedding.

If the license expression was made the string value of the license key, as reserved by PEP 621, it would be slightly shorter for users to type and more obviously the preferred approach. However, it is far less obvious that it is a license expression at all, to authors and those viewing the files, and this lack of clarity, explicitness, ambiguity and potential for user confusion is exactly what this PEP seeks to avoid, all to save a few characters over other approaches.

If an expression subkey was added to the license table, it would retain the clarity of a new top-level key, but add additional complexity for no real benefit, with an extra level of nesting, and users and tools needing to deal with the mutual exclusivity of the subkeys, as before. And allowing both (as a table subkey and the string value) would inherit both’s downsides, while adding even more spec and tool complexity and making there more than “one obvious way to do it”, further potentially confusing users.

Therefore, a separate top-level license-expression key was adopted to avoid all these issues, with relatively minimal downside aside from adding a single additional key and (versus some approaches) a few extra characters to type.

Add a type key to treat as expression

Instead of creating a new top-level license-expression key in the PEP 621 source metadata, one could add a type subkey to the existing license table to control whether text (or a string value) is interpreted as free-text or a license expression. This could make backward compatibility a little more seamless, as older tools could ignore it and always treat text as license, while newer tools would know to treat it as a license expression, if type was set appropriately. Indeed, PEP 621 seems to suggest something of this sort as a possible alternative way that SPDX license expressions could be implemented.

However, all the same downsides as in the previous item apply here, including greater complexity, a more complex mapping between the project source metadata and core metadata and inconsistency between the presentation in tool config, PEP 621 and core metadata, a much less clean deprecation, further bikeshedding over what to name it, and inability to mark one but not the other as dynamic, among others.

In addition, while theoretically potentially a little easier in the short term, in the long term it would mean users would always have to remember to specify the correct type to ensure their license expression is interpreted correctly, which adds work and potential for error; we could never safety change the default while being confident that users understand that what they are entering is unambiguously a license expression, with all the false positive and false negative issues as above.

Therefore, for these as well as the same reasons this approach was rejected for the core metadata in favor of a distinct License-Expression field, we similarly reject this here.

Must be marked dynamic to back-fill

The license key in the pyproject.toml could be required to be explicitly set to dynamic in order for the License core metadata field to be automatically back-filled from the value of the license-expression key. This would be more explicit that the filling will be done, as strictly speaking the license key is not (and cannot be) specified in pyproject.toml, and satisfies a stricter interpretation of the letter of the current PEP 621 specification that this PEP revises.

However, this isn’t seen to be necessary, because it is simply using the static, verbatim literal value of the license-expression key, as specified strictly in this PEP. Therefore, any conforming tool can trivially, deterministically and unambiguously derive this using only the static data in the pyproject.toml file itself.

Furthermore, this actually adds significant ambiguity, as it means the value could get filled arbitrarily by other tools, which would in turn compromise and conflict with the value of the new License-Expression field, which is why such is explicitly prohibited by this PEP. Therefore, not marking it as dynamic will ensure it is only handled in accordance with this PEP’s requirements.

Finally, users explicitly being told to mark it as dynamic, or not, to control filling behavior seems to be a bit of a mis-use of the dynamic field as apparently intended, and prevents tools from adapting to best practices (fill, don’t fill, etc) as they develop and evolve over time.

Source metadata license-files key

Alternatives considered for the license-files key in the PEP 621 project source metadata, primarily related to the path/glob type handling.

Add a type subkey to license-files

Instead of defining mutually exclusive paths and globs subkeys of the license-files PEP 621 project metadata key, we could achieve the same effect with a files subkey for the list and a type subkey for how to interpret it. However, the latter offers no real advantage over the former, in exchange for requiring more keystrokes, verbosity and complexity, as well as less flexibility in allowing both, or another additional subkey in the future, as well as the need to bikeshed over the subkey name. Therefore, it was summarily rejected.

Only accept verbatim paths

Globs could be disallowed completely as values to the license-files key in pyproject.toml and only verbatim literal paths allowed. This would ensure that all license files are explicitly specified, all specified license files are found and included, and the source metadata is completely static in the strictest sense of the term, without tools having to inspect the rest of the project source files to determine exactly what license files will be included and what the License-File values will be. This would also modestly simplify the spec and tool implementation.

However, practicality once again beats purity here. Globs are supported and used by many existing tools for finding license files, and explicitly specifying the full path to every license file would be unnecessarily tedious for more complex projects with vendored code and dependencies. More critically, it would make it much easier to accidentally miss a required legal file, silently rendering the package illegal to distribute.

Tools can still statically and consistently determine the files to be included, based only on those glob patterns the user explicitly specified and the filenames in the package, without installing it, executing its code or even examining its files. Furthermore, tools are still explicitly allowed to warn if specified glob patterns (including full paths) don’t match any files. And, of course, sdists, wheels and others will have the full static list of files specified in their distribution metadata.

Perhaps most importantly, this would also preclude the currently specified default value, as widely used by the current most popular tools, and thus be a major break to backward compatibility, tool consistency, and safe and sane default functionality to avoid unintentional license violations. And of course, authors are welcome and encouraged to specify their license files explicitly via the paths table subkey, once they are aware of it and if it is suitable for their project and workflow.

Only accept glob patterns

Conversely, all license-files strings could be treated as glob patterns. This would slightly simplify the spec and implementation, avoid an extra level of nesting, and more closely match the configuration format of existing tools.

However, for the cost of a few characters, it ensures users are aware whether they are entering globs or verbatim paths. Furthermore, allowing license files to be specified as literal paths avoids edge cases, such as those containing glob characters (or those confusingly or even maliciously similar to them, as described in PEP 672).

Including an explicit paths value ensures that the resulting License-File metadata is correct, complete and purely static in the strictest sense of the term, with all license paths explicitly specified in the pyproject.toml file, guaranteed to be included and with an early error should any be missing. This is not practical to do, at least without serious limitations for many workflows, if we must assume the items are glob patterns rather than literal paths.

This allows tools to locate them and know the exact values of the License-File core metadata fields without having to traverse the source tree of the project and match globs, potentially allowing easier, more efficient and reliable programmatic inspection and processing.

Therefore, given the relatively small cost and the significant benefits, this approach was not adopted.

Infer whether paths or globs

It was considered whether to simply allow specifying an array of strings directly for the license-files key, rather than making it a table with explicit paths and globs. This would be somewhat simpler and avoid an extra level of nesting, and more closely match the configuration format of existing tools. However, it was ultimately rejected in favor of separate, mutually exclusive paths and globs table subkeys.

In practice, it only saves six extra characters in the pyproject.toml (license-files = [...] vs license-files.globs = [...]), but allows the user to more explicitly declare their intent, ensures they understand how the values are going to be interpreted, and serves as an unambiguous indicator for tools to parse them as globs rather than verbatim path literals.

This, in turn, allows for more appropriate, clearly specified tool behaviors for each case, many of which would be unreliable or impossible without it, to avoid common traps, provide more helpful feedback and behave more sensibly and intuitively overall. These include, with paths, guaranteeing that each and every specified file is included and immediately raising an error if one is missing, and with globs, checking glob syntax, excluding unwanted backup, temporary, or other such files (as current tools already do), and optionally warning if a glob doesn’t match any files. This also avoids edge cases (e.g. paths that contain glob characters) and reliance on heuristics to determine interpretation—the very thing this PEP seeks to avoid.

Also allow a flat array value

Initially, after deciding to define license-files as a table of paths and globs, thought was given to making a top-level string array under the license-files key mean one or the other (probably globs, to match most current tools). This is slightly shorter and simpler, would allow gently nudging users toward a preferred one, and allow a slightly cleaner handling of the empty case (which, at present, is treated identically for either).

However, this again only saves six characters in the best case, and there isn’t an obvious choice; whether from a perspective of preference (both had clear use cases and benefits), nor as to which one users would naturally assume.

Flat may be better than nested, but in the face of ambiguity, users may not resist the temptation to guess. Requiring users to explicitly specify one or the other ensures they are aware of how their inputs will be handled, and is more readable for others, both human and machine alike. It also makes the spec and tool implementation slightly more complicated, and it can always be added in the future, but not removed without breaking backward compatibility. And finally, for the “preferred” option, it means there is more than one obvious way to do it.

Therefore, per PEP 20, the Zen of Python, this approach is hereby rejected.

Allow both paths and globs subkeys

Allowing both paths and globs subkeys to be specified under the license-files table was considered, as it could potentially allow more flexible handling for particularly complex projects, and specify on a per-pattern rather than overall basis whether license-files entries should be treated as paths or globs.

However, given the existing proposed approach already matches or exceeds the power and capabilities of those offered in tools’ config files, there isn’t clear demand for this and few likely cases that would benefit, it adds a large amount of complexity for relatively minimal gain, in terms of the specification, in tool implementations and in pyproject.toml itself.

There would be many more edge cases to deal with, such as how to handle files matched by both lists, and it conflicts in multiple places with the current specification for how tools should behave with one or the other, such as when no files match, guarantees of all files being included and of the file paths being explicitly, statically specified, and others.

Like the previous, if there is a clear need for it, it can be always allowed in the future in a backward-compatible manner (to the extent it is possible in the first place), while the same is not true of disallowing it. Therefore, it was decided to require the two subkeys to be mutually exclusive.

Rename paths subkey to files

Initially, it was considered whether to name the paths subkey of the license-files table files instead. However, paths was ultimately chosen, as calling the table subkey files resulted in duplication between the table name (license-files) and the subkey name (files), i.e. license-files.files = ["LICENSE.txt"], made it seem like the preferred/ default subkey when it was not, and lacked the same parallelism with globs in describing the format of the string entry rather than what was being pointed to.

Must be marked dynamic to use defaults

It may seem outwardly sensible, at least with a particularly restrictive interpretation of PEP 621 ‘s description of the dynamic list, to consider requiring the license-files key to be explicitly marked as dynamic in order for the default glob patterns to be used, or alternatively for license files to be matched and included at all.

However, this is merely declaring a static, strictly-specified default value for this particular key, required to be used exactly by all conforming tools (so long as it is not marked dynamic, negating this argument entirely), and is no less static than any other set of glob patterns the user themself may specify. Furthermore, the resulting License-File core metadata values can still be determined with only a list of files in the source, without installing or executing any of the code, or even inspecting file contents.

Moreover, even if this were not so, practicality would trump purity, as this interpretation would be strictly backwards-incompatible with the existing format, and be inconsistent with the behavior with the existing tools. Further, this would create a very serious and likely risk of a large number of projects unknowingly no longer including legally mandatory license files, making their distribution technically illegal, and is thus not a sane, much less sensible default.

Finally, aside from adding an additional line of default-required boilerplate to the file, not defining the default as dynamic allows authors to clearly and unambiguously indicate when their build/packaging tools are going to be handling the inclusion of license files themselves rather than strictly conforming to the PEP 621 portions of this PEP; to do otherwise would defeat the primary purpose of the dynamic list as a marker and escape hatch.

License file paths

Alternatives related to the paths and locations of license files in the source and built distributions.

Flatten license files in subdirectories

Previous drafts of this PEP were silent on the issue of handling license files in subdirectories. Currently, the Wheel and (following its example) Setuptools projects flatten all license files into the .dist-info directory without preserving the source subdirectory hierarchy.

While this is the simplest approach and matches existing ad hoc practice, this can result in name conflicts and license files clobbering others, with no obvious defined behavior for how to resolve them, and leaving the package legally un-distributable without any clear indication to users that their specified license files have not been included.

Furthermore, this leads to inconsistent relative file paths for non-root license files between the source, sdist and wheel, and prevents the paths given in the PEP 621 “static” metadata from being truly static, as they need to be flattened, and may potentially overwrite one another. Finally, the source directory structure often implies valuable information about what the licenses apply to, and where to find them in the source, which is lost when flattening them and far from trivial to reconstruct.

To resolve this, the PEP now proposes, as did contributors on both of the above issues, reproducing the source directory structure of the original license files inside the .dist-info directory. This would fully resolve the concerns above, with the only downside being a more nested .dist-info directory. There is still a risk of collision with edge-case custom filenames (e.g. RECORD, METADATA), but that is also the case with the previous approach, and in fact with fewer files flattened into the root, this would actually reduce the risk. Furthermore, the following proposal rooting the license files under a license_files subdirectory eliminates both collisions and the clutter problem entirely.

Resolve name conflicts differently

Rather than preserving the source directory structure for license files inside the .dist-info directory, we could specify some other mechanism for conflict resolution, such as pre- or appending the parent directory name to the license filename, traversing up the tree until the name was unique, to avoid excessively nested directories.

However, this would not address the path consistency issues, would require much more discussion, coordination and bikeshedding, and further complicate the specification and the implementations. Therefore, it was rejected in favor of the simpler and more obvious solution of just preserving the source subdirectory layout, as many stakeholders have already advocated for.

Dump directly in .dist-info

Previously, the included license files were stored directly in the top-level .dist-info directory of built wheels and installed projects. This followed existing ad hoc practice, ensured most existing wheels currently using this feature will match new ones, and kept the specification simpler, with the license files always being stored in the same location relative to the core metadata regardless of distribution type.

However, this leads to a more cluttered .dist-info directory, littered with arbitrary license files and subdirectories, as opposed to separating licenses into their own namespace (which per the Zen of Python, PEP 20, are “one honking great idea”). While currently small, there is still a risk of collision with specific custom license filenames (e.g. RECORD, METADATA) in the .dist-info directory, which would only increase if and when additional files were specified here, and would require carefully limiting the potential filenames used to avoid likely conflicts with those of license-related files. Finally, putting licenses into their own specified subdirectory would allow humans and tools to quickly, easily and correctly list, copy and manipulate all of them at once (such as in distro packaging, legal checks, etc) without having to reference each of their paths from the core metadata.

Therefore, now is a prudent time to specify an alternate approach. The simplest and most obvious solution, as suggested by several on the Wheel and Setuptools implementation issues, is to simply root the license files relative to a license_files subdirectory of .dist-info. This is simple to implement and solves all the problems noted here, without clear significant drawbacks relative to other more complex options.

It does make the specification a bit more complex and less elegant, but implementation should remain equally simple. It does mean that wheels produced with following this change will have differently-located licenses than those prior, but as this was already true for those in subdirectories, and until this PEP there was no way of discovering these files or accessing them programmatically, this doesn’t seem likely to pose significant problems in practice. Given this will be much harder if not impossible to change later, once the status quo is standardized, tools are relying on the current behavior and there is much greater uptake of not only simply including license files but potentially accessing them as well using the core metadata, if we’re going to change it, now would be the time (particularly since we’re already introducing an edge-case change with how license files in subdirs are handled, along with other refinements).

Therefore, the latter has been incorporated into current drafts of this PEP.

Add new licenses category to wheel

Instead of defining a root license directory (license_files) inside the core metadata directory (.dist-info) for wheels, we could instead define a new category (and, presumably, a corresponding install scheme), similar to the others currently included under .data in the wheel archive, specifically for license files, called (e.g.) licenses. This was mentioned by the wheel creator, and would allow installing licenses somewhere more platform-appropriate and flexible than just the .dist-info directory in the site path, and potentially be conceptually cleaner than including them there.

However, at present, this PEP does not implement this idea, and it is deferred to a future one. It would add significant complexity and friction to this PEP, being primarily concerned with standardizing existing practice and updating the core metadata specification. Furthermore, doing so would likely require modifying sysconfig and the install schemes specified therein, alongside Wheel, Installer and other tools, which would be a non-trivial undertaking. While potentially slightly more complex for repackagers (such as those for Linux distributions), the current proposal still ensures all license files are included, and in a single dedicated directory (which can easily be copied or relocated downstream), and thus should still greatly improve the status quo in this regard without the attendant complexity.

In addition, this approach is not fully backwards compatible (since it isn’t transparent to tools that simply extract the wheel), is a greater departure from existing practice and would lead to more inconsistent license install locations from wheels of different versions. Finally, this would mean licenses would not be installed as proximately to their associated code, there would be more variability in the license root path across platforms and between built distributions and installed projects, accessing installed licenses programmatically would be more difficult, and a suitable install location and method would need to be created, discussed and decided that would avoid name clashes.

Therefore, to keep this PEP in scope, the current approach was retained.

Name the subdirectory licenses

Both licenses and license_files have been suggested as potential names for the root license directory inside .dist-info of wheels and installed projects. The former is slightly shorter, but the latter is more clear and unambiguous regarding its contents, and is consistent with the name of the core metadata field (License-File) and the PEP 621 project source metadata key (license-files). Therefore, the latter was chosen instead.

Other ideas

Miscellaneous proposals, possibilities and discussion points that were ultimately not adopted.

Map identifiers to license files

This would require using a mapping (as two parallel lists would be too prone to alignment errors), which would add extra complexity to how license are documented and add an additional nesting level.

A mapping would be needed, as it cannot be guaranteed that all expressions (keys) have a single license file associated with them (e.g. GPL with an exception may be in a single file) and that any expression does not have more than one. (e.g. an Apache license LICENSE and its NOTICE file, for instance, are two distinct files). For most common cases, a single license expression and one or more license files would be perfectly adequate. In the rarer and more complex cases where there are many licenses involved, authors can still safety use the fields specified here, just with a slight loss of clarity by not specifying which text file(s) map to which license identifier (though this should be clear in practice given each license identifier has corresponding SPDX-registered full license text), while not forcing the more complex data model (a mapping) on the large majority of users who do not need or want it.

We could of course have a data field with multiple possible value types (it’s a string, it’s a list, it’s a mapping!) but this could be a source of confusion. This is what has been done, for instance, in npm (historically) and in Rubygems (still today), and as result tools need to test the type of the metadata field before using it in code, while users are confused about when to use a list or a string. Therefore, this approach is rejected.

Map identifiers to source files

As discussed previously, file-level notices are out of scope for this PEP, and the existing SPDX-License-Identifier convention can already be used if this is needed without further specification here.

Don’t freeze compatibility with a specific SPDX version

This PEP could omit specifying a specific SPDX specification version, or one for the list of valid license identifiers, which would allow more flexible updates as the specification evolves without another PEP or equivalent.

However, serious concerns were expressed about a future SPDX update breaking compatibility with existing expressions and identifiers, leaving current packages with invalid metadata per the definition in this PEP. Requiring compatibility with a specific version of these specifications here and a PEP or similar process to update it avoids this contingency, and follows the practice of other packaging ecosystems.

Therefore, it was decided to specify a minimum version and requires tools to be compatible with it, while still allowing updates so long as they don’t break backward compatibility. This enables tools to immediate take advantage of improvements and accept new licenses, but also remain backwards compatible with the version specified here, balancing flexibility and compatibility.

Different licenses for source and binary distributions

As an additional use case, it was asked whether it was in scope for this PEP to handle cases where the license expression for a binary distribution (wheel) is different from that for a source distribution (sdist), such as in cases of non-pure-Python packages that compile and bundle binaries under different licenses than the project itself. An example cited was PyTorch, which contains CUDA from Nvidia, which is freely distributable but not open source. NumPy and SciPy also had similar issues, as reported by the original author of this PEP and now resolved for those cases.

However, given the inherent complexity here and a lack of an obvious mechanism to do so, the fact that each wheel would need its own license information, lack of support on PyPI for exposing license info on a per-distribution archive basis, and the relatively niche use case, it was determined to be out of scope for this PEP, and left to a future PEP to resolve if sufficient need and interest exists and an appropriate mechanism can be found.

Open Issues

Should the License field be back-filled, or mutually exclusive?

At present, this PEP explicitly allows, but does not formally recommend or require, build tools to back-fill the License core metadata field with the verbatim text from the License-Expression field. This would presumably improve backwards compatibility and was suggested by some on the Discourse thread. On the other hand, allowing it does increase complexity and is less of a clean, consistent separation, preventing the License field from being completely mutually exclusive with the new License-Expression field and requiring that their values match.

As such, it would be very useful to have a more concrete and specific rationale and use cases for the back-filled data, and give fuller consideration to any potential benefits or drawbacks of this approach, in order to come to a final consensus on this matter that can be appropriately justified here.

Therefore, is the status quo expressed here acceptable, allowing tools leeway to decide this for themselves? Should this PEP formally recommend, or even require, that tools back-fill this metadata (which would presumably be reversed once a breaking revision of the metadata spec is issued)? Or should this not be explicitly allowed, discouraged or even prohibited?

Should custom license identifiers be allowed?

The current version of this PEP retains the behavior of only specifying the use of SPDX-defined license identifiers, as well as the explicitly defined custom identifiers LicenseRef-Public-Domain and LicenseRef-Proprietary to handle the two common cases where projects have a license, but it is not one that has a recognized SPDX license identifier.

For maximum flexibility, custom LicenseRef-<CUSTOM-TEXT> license identifiers could be allowed, which could potentially be useful for niche cases or corporate environments where LicenseRef-Proprietary is not appropriate or insufficiently specific, but relying on mainstream Python build tooling and the License-Expression metadata field is still desirable to use for this purpose.

This has the downsides, however, of not catching misspellings of the canonically defined license identifiers and thus producing license metadata that is not a valid match for what the author intended, as well as users potentially thinking they have to prepend LicenseRef in front of valid license identifiers, as there seems to be some previous confusion about. Furthermore, this encourages the proliferation of bespoke license identifiers, which obviates the purpose of enabling clear, unambiguous and well understood license metadata for which this PEP was created.

Indeed, for niche cases that need specific, proprietary custom licenses, they could always simply specify LicenseRef-Proprietary, and then include the actual license files needed to unambiguously identify the license regardless (if not using SPDX license identifiers) under the License-File fields. Requiring standards-conforming tools to allow custom license identifiers does not seem very useful, since standard tools will not recognize bespoke ones or know how to treat them. By contrast, bespoke tools, which would be required in any case to understand and act on custom identifiers, are explicitly allowed, with good reason (thus the SHOULD keyword) to not require that license identifiers conform to those listed here. Therefore, this specification still allows such use in private corporate environments or specific ecosystems, while avoiding the disadvantages of imposing them on all mainstream packaging tools.

As an alternative, a literal LicenseRef-Custom identifier could be defined, which would more explicitly indicate that the license cannot be expressed with defined identifiers and the license text should be referenced for details, without carrying the negative and potentially inappropriate implications of LicenseRef-Proprietary. This would avoid the main mentioned downsides (misspellings, confusion, license proliferation) of the approve approach of allowing an arbitrary LicenseRef, while addressing several of the potential theoretical scenarios cited for it.

On the other hand, as SPDX aims to (and generally does) encompass all FSF-recognized “Free” and OSI-approved “Open Source” licenses, and those sources are kept closely in sync and are now relatively stable, anything outside those bounds would generally be covered by LicenseRef-Proprietary, thus making LicenseRef-Custom less specific in that regard, and somewhat redundant to it. Furthermore, it may mislead authors of projects with complex/multiple licenses that they should use it over specifying a license expression.

At present, the PEP retains the existing approach over either of these, given the use cases and benefits were judged to be sufficiently marginal based on the current understanding of the packaging landscape. For both these proposals, however, if more concrete use cases emerge, this can certainly be reconsidered, either for this current PEP or a future one (before or in tandem with actually removing the legacy unstructured License metadata field). Not defining this now enables allowing it later (or still now, with custom packaging tools), without affecting backward compatibility, while the same is not so if they are allowed now and later determined to be unnecessary or too problematic in practice.

Appendix: License Expression Examples

Basic example

The Setuptools project itself, as of version 59.1.1, does not use the License field in its own project source metadata. Further, it no longer explicitly specifies license_file/license_files as it did previously, since Setuptools relies on its own automatic inclusion of license-related files matching common patterns, such as the LICENSE file it uses.

It includes the following license-related metadata in its setup.cfg:

[metadata]
classifiers =
    License :: OSI Approved :: MIT License

The simplest migration to this PEP would consist of using this instead:

[metadata]
license_expression = MIT

Or, in a PEP 621 pyproject.toml:

[project]
license-expression = "MIT"

The output core metadata for the distribution packages would then be:

License-Expression: MIT
License-File: LICENSE

The LICENSE file would be stored at /setuptools-{version}/LICENSE in the sdist and /setuptools-{version}.dist-info/license_files/LICENSE in the wheel, and unpacked from there into the site directory (e.g. site-packages) on installation; / is the root of the respective archive and {version} the version of the Setuptools release in the core metadata.

Advanced example

Suppose Setuptools were to include the licenses of the third-party projects that are vendored in the setuptools/_vendor/ and pkg_resources/_vendor directories; specifically:

packaging==21.2
pyparsing==2.2.1
ordered-set==3.1.1
more_itertools==8.8.0

The license expressions for these projects are:

packaging: Apache-2.0 OR BSD-2-Clause
pyparsing: MIT
ordered-set: MIT
more_itertools: MIT

A comprehensive license expression covering both Setuptools proper and its vendored dependencies would contain these metadata, combining all the license expressions into one. Such an expression might be:

MIT AND (Apache-2.0 OR BSD-2-Clause)

In addition, per the requirements of the licenses, the relevant license files must be included in the package. Suppose the LICENSE file contains the text of the MIT license and the copyrights used by Setuptools, pyparsing, more_itertools and ordered-set; and the LICENSE* files in the setuptools/_vendor/packaging/ directory contain the Apache 2.0 and 2-clause BSD license text, and the Packaging copyright statement and license choice notice.

Specifically, we assume the license files are located at the following paths in the project source tree (relative to the project root and pyproject.toml):

LICENSE
setuptools/_vendor/packaging/LICENSE
setuptools/_vendor/packaging/LICENSE.APACHE
setuptools/_vendor/packaging/LICENSE.BSD

Putting it all together, our setup.cfg would be:

[metadata]
license_expression = MIT AND (Apache-2.0 OR BSD-2-Clause)
license_files =
    LICENSE
    setuptools/_vendor/packaging/LICENSE
    setuptools/_vendor/packaging/LICENSE.APACHE
    setuptools/_vendor/packaging/LICENSE.BSD

In a PEP 621 pyproject.toml, with license files specified explicitly via the paths subkey, this would look like:

[project]
license-expression = "MIT AND (Apache-2.0 OR BSD-2-Clause)"
license-files.paths = [
    "LICENSE",
    "setuptools/_vendor/LICENSE",
    "setuptools/_vendor/LICENSE.APACHE",
    "setuptools/_vendor/LICENSE.BSD",
]

Or alternatively, matched via glob patterns, this could be:

[project]
license-expression = "MIT AND (Apache-2.0 OR BSD-2-Clause)"
license-files.globs = [
    "LICENSE*",
    "setuptools/_vendor/LICENSE*",
]

With either approach, the output core metadata in the distribution would be:

License-Expression: MIT AND (Apache-2.0 OR BSD-2-Clause)
License-File: LICENSE
License-File: setuptools/_vendor/packaging/LICENSE
License-File: setuptools/_vendor/packaging/LICENSE.APACHE
License-File: setuptools/_vendor/packaging/LICENSE.BSD

In the resulting sdist, with / as the root of the archive and {version} the version of the Setuptools release specified in the core metadata, the license files would be located at the paths:

/setuptools-{version}/LICENSE
/setuptools-{version}/setuptools/_vendor/packaging/LICENSE
/setuptools-{version}/setuptools/_vendor/packaging/LICENSE.APACHE
/setuptools-{version}/setuptools/_vendor/packaging/LICENSE.BSD

In the built wheel, with / being the root of the archive and {version} as the previous, the license files would be stored at:

/setuptools-{version}.dist-info/license_files/LICENSE
/setuptools-{version}.dist-info/license_files/setuptools/_vendor/packaging/LICENSE
/setuptools-{version}.dist-info/license_files/setuptools/_vendor/packaging/LICENSE.APACHE
/setuptools-{version}.dist-info/license_files/setuptools/_vendor/packaging/LICENSE.BSD

Finally, in the installed project, with site-packages being the site dir and {version} as the previous, the license files would be installed to:

site-packages/setuptools-{version}.dist-info/license_files/LICENSE
site-packages/setuptools-{version}.dist-info/license_files/setuptools/_vendor/packaging/LICENSE
site-packages/setuptools-{version}.dist-info/license_files/setuptools/_vendor/packaging/LICENSE.APACHE
site-packages/setuptools-{version}.dist-info/license_files/setuptools/_vendor/packaging/LICENSE.BSD

Conversion example

Suppose we were to return to our simple Setuptools case. Per the specification, given it only has the following license classifier:

Classifier: License :: OSI Approved :: MIT License

And no value for the License field, or equivalently, if it had a value of:

License: MIT

Then the suggested value for the License-Expression field would be:

License-Expression: MIT

For the more complex case, assuming it was currently expressed as multiple license classifiers, no automatic conversion could be performed due to the inherent ambiguity, and the user would be prompted on how to handle the situation themselves.

Expression examples

Some additional examples of valid License-Expression values:

License-Expression: MIT

License-Expression: BSD-3-Clause

License-Expression: MIT AND (Apache-2.0 OR BSD-2-clause)

License-Expression: MIT OR GPL-2.0-or-later OR (FSFUL AND BSD-2-Clause)

License-Expression: GPL-3.0-only WITH Classpath-Exception-2.0 OR BSD-3-Clause

License-Expression: LicenseRef-Public-Domain OR CC0-1.0 OR Unlicense

License-Expression: LicenseRef-Proprietary

Appendix: User Scenarios

The following covers the range of common use cases from a user perspective, providing straightforward guidance for each. Do note that the following should not be considered legal advice, and readers should consult a licensed legal practitioner in their jurisdiction if they are unsure about the specifics for their situation.

I have a private package that won’t be distributed

If your package isn’t shared publicly, i.e. outside your company, organization or household, it usually isn’t strictly necessary to include a formal license, so you wouldn’t necessarily have to do anything extra here.

However, it is still a good idea to include LicenseRef-Proprietary as a license expression in your package configuration, and/or a copyright statement and any legal notices in a LICENSE.txt file in the root of your project directory, which will be automatically included by packaging tools.

I want to distribute my project under a specific license

To use a particular license, simply paste its text into a LICENSE.txt file at the root of your repo, if you don’t have it in a file starting with LICENSE or COPYING already, and add license-expression = "LICENSE-ID" under [project] in your pyproject.toml if your packaging tool supports it, or else in its config file (e.g. for Setuptools, license_expression = LICENSE-ID under [metadata] in setup.cfg). You can find the LICENSE-ID and copyable license text on sites like ChooseALicense or SPDX.

Many popular code hosts, project templates and packaging tools can add the license file for you, and may support the expression as well in the future.

I maintain an existing package that’s already licensed

If you already have license files and metadata in your project, you should only need to make a couple of tweaks to take advantage of the new functionality.

In your project config file, enter your license expression under license-expression (PEP 621 pyproject.toml), license_expression (Setuptools setup.cfg / setup.py), or the equivalent for your packaging tool, and make sure to remove any legacy license value or License :: classifiers. Your existing license value may already be valid as one (e.g. MIT, Apache-2.0 OR BSD-2-Clause, etc); otherwise, check the SPDX license list for the identifier that matches the license used in your project.

If your license files begin with LICENSE, COPYING, NOTICE or AUTHORS, or you’ve already configured your packaging tool to add them (e.g. license_files in setup.cfg), you should already be good to go. If not, make sure to list them under license-files.paths or license-files.globs under [project] in pyproject.toml (if your tool supports it), or else in your tool’s configuration file (e.g. license_files in setup.cfg for Setuptools).

See the basic example for a simple but complete real-world demo of how this works in practice, including some additional technical details. Packaging tools may support automatically converting legacy licensing metadata; check your tool’s documentation for more information.

My package includes other code under different licenses

If your project includes code from others covered by different licenses, such as vendored dependencies or files copied from other open source software, you can construct a license expression (or have a tool help you do so) to describe the licenses involved and the relationship between them.

In short, License-1 AND License-2 mean that both licenses apply to your project, or parts of it (for example, you included a file under another license), and License-1 OR License-2 means that either of the licenses can be used, at the user’s option (for example, you want to allow users a choice of multiple licenses). You can use parenthesis (()) for grouping to form expressions that cover even the most complex situations.

In your project config file, enter your license expression under license-expression (PEP 621 pyproject.toml), license_expression (Setuptools setup.cfg / setup.py), or the equivalent for your packaging tool, and make sure to remove any legacy license value or License :: classifiers.

Also, make sure you add the full license text of all the licenses as files somewhere in your project repository. If all of them are in the root directory and begin with LICENSE, COPYING, NOTICE or AUTHORS, they will be included automatically. Otherwise, you’ll need to list the relative path or glob patterns to each of them under license-files.paths or license-files.globs under [project] in pyproject.toml (if your tool supports it), or else in your tool’s configuration file (e.g. license_files in setup.cfg for Setuptools).

As an example, if your project was licensed MIT but incorporated a vendored dependency (say, packaging) that was licensed under either Apache 2.0 or the 2-clause BSD, your license expression would be MIT AND (Apache-2.0 OR BSD-2-Clause). You might have a LICENSE.txt in your repo root, and a LICENSE-APACHE.txt and LICENSE-BSD.txt in the _vendor subdirectory, so to include all of them, you’d specify ["LICENSE.txt", "_vendor/packaging/LICENSE*"] as glob patterns, or ["LICENSE.txt", "_vendor/LICENSE-APACHE.txt", "_vendor/LICENSE-BSD.txt"] as literal file paths.

See a fully worked out advanced example for a comprehensive end-to-end application of this to a real-world complex project, with copious technical details, and consult a tutorial for more help and examples using SPDX identifiers and expressions.

Appendix: License Documentation in Python

There are multiple ways used or recommended to document Python project licenses today. The most common are listed below.

Core metadata

There are two overlapping core metadata fields to document a license: the license Classifier strings prefixed with License :: and the License field as free text.

The core metadata License field documentation is currently:

License
=======

.. versionadded:: 1.0

Text indicating the license covering the distribution where the license
is not a selection from the "License" Trove classifiers. See
:ref:`"Classifier" <metadata-classifier>` below.
This field may also be used to specify a
particular version of a license which is named via the ``Classifier``
field, or to indicate a variation or exception to such a license.

Examples::

    License: This software may only be obtained by sending the
            author a postcard, and then the user promises not
            to redistribute it.

    License: GPL version 3, excluding DRM provisions

Even though there are two fields, it is at times difficult to convey anything but simpler licensing. For instance, some classifiers lack precision (GPL without a version) and when multiple license classifiers are listed, it is not clear if both licenses must apply, or the user may choose between them. Furthermore, the list of available license classifiers is rather limited and out-of-date.

Setuptools and Wheel

Beyond a license code or qualifier, license text files are documented and included in a built package either implicitly or explicitly, and this is another possible source of confusion:

  • In the Setuptools and Wheel projects, license files are automatically added to the distribution (at their source location in a source distribution/sdist, and in the .dist-info directory of a built wheel) if they match one of a number of common license file name patterns (LICEN[CS]E*, COPYING*, NOTICE* and AUTHORS*). Alternatively, a package author can specify a list of license file paths to include in the built wheel under the license_files key in the [metadata] section of the project’s setup.cfg, or as an argument to the setuptools.setup() function. At present, following the Wheel project’s lead, Setuptools flattens the collected license files into the metadata directory, clobbering files with the same name, and dumps license files directly into the top-level .dist-info directory, but there is a desire to resolve both these issues, contingent on this PEP being accepted.
  • Both tools also support an older, singular license_file parameter that allows specifying only one license file to add to the distribution, which has been deprecated for some time but still sees some use.
  • Following the publication of an earlier draft of this PEP, Setuptools added support for License-File in distribution metadata as described in this specification. This allows other tools consuming the resulting metadata to unambiguously locate the license file(s) for a given package.

PyPA Packaging Guide and Sample Project

Both the PyPA beginner packaging tutorial and its more comprehensive packaging guide state that it is important that every package include a license file. They point to the LICENSE.txt in the official PyPA sample project as an example, which is explicitly listed under the license_files key in its setup.cfg, following existing practice formally specified by this PEP.

Both the beginner packaging tutorial and the sample project only use classifiers to declare a package’s license, and do not include or mention the License field. The full packaging guide does mention this field, but states that authors should use the license classifiers instead, unless the project uses a non-standard license (which the guide discourages).

Python source code files

Note: Documenting licenses in source code is not in the scope of this PEP.

Beside using comments and/or SPDX-License-Identifier conventions, the license is sometimes documented in Python code files using a “dunder” module-level constant, typically named __license__.

This convention, while perhaps somewhat antiquated, is recognized by the built-in help() function and the standard pydoc module. The dunder variable will show up in the help() DATA section for a module.

Other Python packaging tools

  • Conda package manifests have support for license and license_file fields, and automatically include license files following similar naming patterns as the Wheel and Setuptools projects.
  • Flit recommends using classifiers instead of the License field (per the current PyPA packaging guide).
  • PBR uses similar data as Setuptools, but always stored in setup.cfg.
  • Poetry specifies the use of the license field in pyproject.toml with SPDX license identifiers.

Appendix: License Documentation in Other Projects

Here is a survey of how things are done elsewhere.

Linux distribution packages

Note: in most cases, the texts of the most common licenses are included globally in a shared documentation directory (e.g. /usr/share/doc).

  • Debian documents package licenses with machine readable copyright files. It defines its own license expression syntax and list of identifiers for common licenses, both of which are closely related to those of SPDX.
  • Fedora packages specify how to include License Texts and use a License field that must be filled with appropriate short license identifier(s) from an extensive list of “Good Licenses”. Fedora also defines its own license expression syntax, similar to that of SPDX.
  • OpenSUSE packages use SPDX license expressions with SPDX license IDs and a list of additional license identifiers.
  • Gentoo ebuild uses a LICENSE variable. This field is specified in GLEP-0023 and in the Gentoo development manual. Gentoo also defines a list of allowed licenses and a license expression syntax, which is rather different from SPDX.
  • The FreeBSD package Makefile provides LICENSE and LICENSE_FILE fields with a list of custom license symbols. For non-standard licenses, FreeBSD recommends using LICENSE=UNKNOWN and adding LICENSE_NAME and LICENSE_TEXT fields, as well as sophisticated LICENSE_PERMS to qualify the license permissions and LICENSE_GROUPS to document a license grouping. The LICENSE_COMB allows documenting more than one license and how they apply together, forming a custom license expression syntax. FreeBSD also recommends the use of SPDX-License-Identifier in source code files.
  • Arch Linux PKGBUILD defines its own license identifiers. The value 'unknown' can be used if the license is not defined.
  • OpenWRT ipk packages use the PKG_LICENSE and PKG_LICENSE_FILES variables and recommend the use of SPDX License identifiers.
  • NixOS uses SPDX identifiers and some extra license IDs in its license field.
  • GNU Guix (based on NixOS) has a single License field, uses its own license symbols list and specifies how to use one license or a list of them.
  • Alpine Linux packages recommend using SPDX identifiers in the license field.

Language and application packages

  • In Java, Maven POM defines a licenses XML tag with a list of licenses, each with a name, URL, comments and “distribution” type. This is not mandatory, and the content of each field is not specified.
  • The JavaScript NPM package.json uses a single license field with a SPDX license expression, or the UNLICENSED ID if none is specified. A license file can be referenced as an alternative using SEE LICENSE IN <filename> in the single license field.
  • Rubygems gemspec specifies either a single or list of license strings. The relationship between multiple licenses in a list is not specified. They recommend using SPDX license identifiers.
  • CPAN Perl modules use a single license field, which is either a single or a list of strings. The relationship between the licenses in a list is not specified. There is a list of custom license identifiers plus these generic identifiers: open_source, restricted, unrestricted, unknown.
  • Rust Cargo specifies the use of an SPDX license expression (v2.1) in the license field. It also supports an alternative expression syntax using slash-separated SPDX license identifiers, and there is also a license_file field. The crates.io package registry requires that either license or license_file fields are set when uploading a package.
  • PHP composer.json uses a license field with an SPDX license ID or proprietary. The license field is either a single string with resembling the SPDX license expression syntax with and and or keywords; or is a list of strings if there is a (disjunctive) choice of licenses.
  • NuGet packages previously used only a simple license URL, but now specify using a SPDX license expression and/or the path to a license file within the package. The NuGet.org repository states that they only accept license expressions that are “approved by the Open Source Initiative or the Free Software Foundation.”
  • Go language modules go.mod have no provision for any metadata beyond dependencies. Licensing information is left for code authors and other community package managers to document.
  • The Dart/Flutter spec recommends using a single LICENSE file that should contain all the license texts, each separated by a line with 80 hyphens.
  • The JavaScript Bower license field is either a single string or list of strings using either SPDX license identifiers, or a path/URL to a license file.
  • The Cocoapods podspec license field is either a single string, or a mapping with type, file and text keys. This is mandatory unless there is a LICENSE/LICENCE file provided.
  • Haskell Cabal accepts an SPDX license expression since version 2.2. The version of the SPDX license list used is a function of the Cabal version. The specification also provides a mapping between legacy (pre-SPDX) and SPDX license Identifiers. Cabal also specifies a license-file(s) field that lists license files to be installed with the package.
  • Erlang/Elixir mix/hex package specifies a licenses field as a required list of license strings, and recommends using SPDX license identifiers.
  • D Langanguage dub packages define their own list of license identifiers and license expression syntax, similar to the SPDX standard.
  • The R Package DESCRIPTION defines its own sophisticated license expression syntax and list of licenses identifiers. R has a unique way of supporting specifiers for license versions (such as LGPL (>= 2.0, < 3)) in its license expression syntax.

Other ecosystems

  • The SPDX-License-Identifier header is a simple convention to document the license inside a file.
  • The Free Software Foundation (FSF) promotes the use of SPDX license identifiers for clarity in the GPL and other versioned free software licenses.
  • The Free Software Foundation Europe (FSFE) REUSE project promotes using SPDX-License-Identifier.
  • The Linux kernel uses SPDX-License-Identifier and parts of the FSFE REUSE conventions to document its licenses.
  • U-Boot spearheaded using SPDX-License-Identifier in code and now follows the Linux approach.
  • The Apache Software Foundation projects use RDF DOAP with a single license field pointing to SPDX license identifiers.
  • The Eclipse Foundation promotes using SPDX-license-Identifiers.
  • The ClearlyDefined project promotes using SPDX license identifiers and expressions to improve license clarity.
  • The Android Open Source Project uses MODULE_LICENSE_XXX empty tag files, where XXX is a license code such as BSD, APACHE, GPL, etc. It also uses a NOTICE file that contains license and notice texts.

Acknowledgments

  • Nick Coghlan
  • Kevin P. Fleming
  • Pradyun Gedam
  • Oleg Grenrus
  • Dustin Ingram
  • Chris Jerdonek
  • Cyril Roelandt
  • Luis Villa

Source: https://github.com/python-discord/peps/blob/main/pep-0639.rst

Last modified: 2022-02-03 23:16:44 GMT