PEP 527 – Removing Un(der)used file types/extensions on PyPI
- PEP
- 527
- Title
- Removing Un(der)used file types/extensions on PyPI
- Author
- Donald Stufft <donald at stufft.io>
- BDFL-Delegate
- Nick Coghlan <ncoghlan at gmail.com>
- Discussions-To
- distutils-sig@python.org
- Status
- Final
- Type
- Process
- Created
- 23-Aug-2016
- Post-History
- 23-Aug-2016
- Resolution
- Distutils-SIG
Abstract
This PEP recommends deprecating, and ultimately removing, support for uploading
certain unused or under used file types and extensions to PyPI. In particular
it recommends disallowing further uploads of any files of the types
bdist_dumb
, bdist_rpm
, bdist_dmg
, bdist_msi
, and
bdist_wininst
, leaving PyPI to only accept new uploads of the sdist
,
bdist_wheel
, and bdist_egg
file types.
In addition, this PEP proposes removing support for new uploads of sdists using
the .tar
, .tar.bz2
, .tar.xz
, .tar.Z
, .tgz
, .tbz
, and
any other extension besides .tar.gz
and .zip
.
Finally, this PEP also proposes limiting the number of allowed sdist uploads for each individual release of a project on PyPI to one instead of one for each allowed extension.
Rationale
File Formats
Currently PyPI supports the following file types:
sdist
bdist_wheel
bdist_egg
bdist_wininst
bdist_msi
bdist_dmg
bdist_rpm
bdist_dumb
However, these different types of files have varying amounts of usefulness or general use in the ecosystem. Continuing to support them adds a maintenance burden on PyPI as well as tool authors and incurs a cost in both bandwidth and disk space not only on PyPI itself, but also on any mirrors of PyPI.
Python packaging is a multi-level ecosystem where PyPI is primarily suited and used to distribute virtual environment compatible packages directly from their respective project owners. These packages are then consumed either directly by end-users, or by downstream distributors that take these packages and turn them into their respective system level packages (such as RPM, deb, MSI, etc).
While PyPI itself only directly works with these Python specific but platform agnostic packages, we encourage community-driven and commercial conversions of these packages to downstream formats for particular target environments, like:
- The conda cross-platform data analysis ecosystem (conda-forge)
- The deb based Linux ecosystem (Debian, Ubuntu, etc)
- The RPM based Linux ecosystem (Fedora, openSuSE, Mageia, etc)
- The homebrew, MacPorts and fink ecosystems for Mac OS X
- The Windows Package Management ecosystem (NuGet, Chocolatey, etc)
- 3rd party creation of Windows MSIs and installers (e.g. Christoph Gohlke’s work at http://www.lfd.uci.edu/~gohlke/pythonlibs/ )
- other commercial redistribution formats (ActiveState’s PyPM, Enthought Canopy, etc)
- other open source community redistribution formats (Nix, Gentoo, Arch, *BSD, etc)
It is the belief of this PEP that the entire ecosystem is best supported by keeping PyPI focused on the platform agnostic formats, where the limited amount of time by volunteers can be best used instead of spreading the available time out amongst several platforms. Further more, this PEP believes that the people best positioned to provide well integrated packages for a particular platform are people focused on that platform, and not across all possible platforms.
bdist_dumb
As it’s name implies, bdist_dumb
is not a very complex format, however it
is so simple as to be worthless for actual usage.
For instance, if you’re using something like pyenv on macOS and you’re building
a library using Python 3.5, then bdist_dumb
will produce a .tar.gz
file
named something like exampleproject-1.0.macosx-10.11-x86_64.tar.gz
. Right
off the bat this file name is somewhat difficult to differentiate from an
sdist
since they both use the same file extension (and with the legacy pre
PEP 440 versions, 1.0-macosx-10.11-x86_64
is a valid, although quite silly,
version number). However, once you open up the created .tar.gz
, you’d find
that there is no metadata inside that could be used for things like dependency
discovery and in fact, it is quite simply a tarball containing hardcoded paths
to wherever files would have been installed on the computer creating the
bdist_dumb
. Going back to our pyenv on macOS example, this means that if I
created it, it would contain files like:
Users/dstufft/.pyenv/versions/3.5.2/lib/python3.5/site-packages/example.py
bdist_rpm
The bdist_rpm
format on PyPI allows people to upload .rpm
files for
end users to manually download by hand and then manually install by hand.
However, the common usage of rpm
is with a specially designed repository
that allows automatic installation of dependencies, upgrades, etc which PyPI
does not provide. Thus, it is a type of file that is barely being used on PyPI
with only ~460 files of this type having been uploaded to PyPI (out a total of
662,544).
In addition, services like COPR provide a better supported mechanism for publishing and using RPM files than we’re ever likely to get on PyPI.
bdist_dmg, bdist_msi, and bdist_wininst
The bdist_dmg
, bdist_msi
, and bdist_winist
formats are similar in
that they are an OS specific installer that will only install a library into an
environment and are not designed for real user facing installs of applications
(which would require things like bundling a Python interpreter and the like).
Out of these three, the usage for bdist_dmg
and bdist_msi
is very low,
with only ~500 bdist_msi
files and ~50 bdist_dmg
files having been
uploaded to PyPI. The bdist_wininst
format has more use, with ~14,000 files
having ever been uploaded to PyPI.
It’s quite easy to look at the low usage of bdist_dmg
and bdist_msi
and
conclude that removing them will be fairly low impact, however
bdist_wininst
has several orders of magnitude more usage. This is somewhat
misleading though, because although it has more people uploading those files
the actual usage of those uploaded files is fairly low. Taking a look at the
previous 30 days, we can see that 90% of all downloads of bdist_winist
files from PyPI were generated by the mirroring infrastructure and 7% of them
were generated by setuptools (which can currently be better covered by
bdist_egg
files).
Given the small number of files uploaded for bdist_dmg
and bdist_msi
and that bdist_wininst
is largely existing to either consume bandwidth and
disk space via the mirroring infrastructure or could be trivially replaced
with bdist_egg
, this PEP proposes to include these three formats in the
list of those to be disallowed.
File Extensions
Currently sdist
supports a wide variety of file extensions like .tar.gz
,
.tar
, .tar.bz2
, .tar.xz
, .zip
, .tar.Z
, .tgz
, and
.tbz
. However, of those the only extensions which get anything more than
negligible usage is .tar.gz
with 444,338 sdists currently, .zip
with
58,774 sdists currently, and .tar.bz2
with 3,265 sdists currently.
Having multiple formats accepted requires tooling both within PyPI and outside
of PyPI to handle all of the various extensions that might be used (even if
nobody is currently using them). This doesn’t only affect PyPI, but ripples out
throughout the ecosystem. In addition, the different formats all have different
requirements for what optional C libraries Python was linked against and
different requirements for what versions of Python they support. In addition,
multiple formats also create a weird situation where there may be two
sdist
files for a particular project/release with subtly different content.
It’s easy to advocate that anything outside of .tar.gz
, .zip
, and
.tar.bz2
should be disallowed. Outside of a tiny handful, nobody has
actively been uploading these other types of files in the ~15 years of PyPI’s
existence so they’ve obviously not been particularly useful. In addition, while
.tar.xz
is theoretically a nicer format than the other .tar.*
formats
due to the better compression ratio achieved by LZMA, it is only available in
Python 3.3+ and has an optional dependency on the lzma C library.
Looking at the three extensions we do have in current use, it’s also fairly
easy to conclude that .tar.bz2
can be disallowed as well. It has a fairly
small number of files ever uploaded with it and it requires an additional
optional C library to handle the bzip2 compression.
Finally we get down to .tar.gz
and .zip
. Looking at the pure numbers
for these two, we can see that .tar.gz
is by far the most uploaded format,
with 444,338 total uploaded compared to .zip
’s 58,774 and on POSIX
operating systems .tar.gz
is also the default produced by all currently
released versions of Python and setuptools. In addition, these two file types
both use the same C library (zlib
) which is also required for
bdist_wheel
and bdist_egg
. The two wrinkles with deciding between
.tar.gz
and .zip
is that while on POSIX operating systems .tar.gz
is the default, on Windows .zip
is the default and the bdist_wheel
format also uses zip.
Instead of trying to standardize on either .tar.gz
or .zip
, this PEP
proposes that we allow either .tar.gz
or .zip
for sdists.
Limiting number of sdists per release
A sdist on PyPI should be a single source of truth for a particular release of software. However, currently PyPI allows you to upload one sdist for each of the sdist file extensions it allows. Currently this allows something like 10 different sdists for a project, but even with this PEP it allows two different sources of truth for a single version. Having multiple sdists oftentimes can account for strange bugs that only expose themselves based on which sdist that the person used.
To resolve this, this PEP proposes to allow one, and only one, sdist per release of a project.
Removal Process
This PEP does NOT propose removing any existing files from PyPI, only disallowing new ones from being uploaded. This restriction will be phased in on a per-project basis to allow projects to adjust to the new restrictions where applicable.
First, any existing projects will be flagged to allow legacy file types to be
uploaded, and any project without that flag (i.e. new projects) will not be
able to upload anything but sdist
with a .tar.gz
or .zip
extension,
bdist_wheel
, and bdist_egg
. Then, any existing projects that have never
uploaded a file that requires the legacy file type flag will have that flag
removed, also making them fall under the new restrictions. Finally, an email
will be generated to the maintainers of all projects still given the legacy
flag, which will inform them of the upcoming new restrictions on uploads and
tell them that these restrictions will be applied to future uploads to their
projects starting in 1 month. Finally, after 1 month all projects will have the
legacy file type flag removed, and support for uploading these types of files
will cease to exist on PyPI.
This plan should provide minimal disruption since it does not remove any existing files, and the types of files it does prevent from being uploaded are either not particularly useful (or used) types of files or they can continue to upload a similar type of file with a slight change to their process.
Copyright
This document has been placed in the public domain.
Source: https://github.com/python-discord/peps/blob/main/pep-0527.txt
Last modified: 2022-01-21 11:03:51 GMT