PEP 376 – Database of Installed Python Distributions
- PEP
- 376
- Title
- Database of Installed Python Distributions
- Author
- Tarek Ziadé <tarek at ziade.org>
- Status
- Final
- Type
- Standards Track
- Created
- 22-Feb-2009
- Python-Version
- 2.7, 3.2
- Post-History
Abstract
The goal of this PEP is to provide a standard infrastructure to manage project distributions installed on a system, so all tools that are installing or removing projects are interoperable.
To achieve this goal, the PEP proposes a new format to describe installed distributions on a system. It also describes a reference implementation for the standard library.
In the past an attempt was made to create an installation database (see PEP 262 ).
Combined with PEP 345, the current proposal supersedes PEP 262.
Note: the implementation plan didn’t go as expected, so it should be considered informative only for this PEP.
Rationale
There are two problems right now in the way distributions are installed in Python:
- There are too many ways to do it and this makes interoperation difficult.
- There is no API to get information on installed distributions.
How distributions are installed
Right now, when a distribution is installed in Python, every element can be installed in a different directory.
For instance, Distutils
installs the pure Python code in the purelib
directory, which is lib/python2.6/site-packages
for unix-like systems and
Mac OS X, or Lib\site-packages
under Python’s installation directory for
Windows.
Additionally, the install_egg_info
subcommand of the Distutils install
command adds an .egg-info
file for the project into the purelib
directory.
For example, for the docutils
distribution, which contains one package an
extra module and executable scripts, three elements are installed in
site-packages
:
docutils
: Thedocutils
package.roman.py
: An extra module used bydocutils
.docutils-0.5-py2.6.egg-info
: A file containing the distribution metadata as described in PEP 314. This file corresponds to the file calledPKG-INFO
, built by thesdist
command.
Some executable scripts, such as rst2html.py
, are also added in the
bin
directory of the Python installation.
Another project called setuptools
[3] has two other formats
to install distributions, called EggFormats
[6]:
- a self-contained
.egg
directory, that contains all the distribution files and the distribution metadata in a file calledPKG-INFO
in a subdirectory calledEGG-INFO
.setuptools
creates other files in that directory that can be considered as complementary metadata. - an
.egg-info
directory installed insite-packages
, that contains the same filesEGG-INFO
has in the.egg
format.
The first format is automatically used when you install a distribution that
uses the setuptools.setup
function in its setup.py file, instead of
the distutils.core.setup
one.
setuptools
also add a reference to the distribution into an
easy-install.pth
file.
Last, the setuptools
project provides an executable script called
easy_install
[4] that installs all distributions, including
distutils-based ones in self-contained .egg
directories.
If you want to have standalone .egg-info
directories for your distributions,
e.g. the second setuptools
format, you have to force it when you work
with a setuptools-based distribution or with the easy_install
script.
You can force it by using the --single-version-externally-managed
option
or the --root
option. This will make the setuptools
project install
the project like distutils does.
This option is used by :
Uninstall information
Distutils doesn’t provide an uninstall
command. If you want to uninstall
a distribution, you have to be a power user and remove the various elements
that were installed, and then look over the .pth
file to clean them if
necessary.
And the process differs depending on the tools you have used to install the
distribution and if the distribution’s setup.py
uses Distutils or
Setuptools.
Under some circumstances, you might not be able to know for sure that you have removed everything, or that you didn’t break another distribution by removing a file that is shared among several distributions.
But there’s a common behavior: when you install a distribution, files are copied in your system. And it’s possible to keep track of these files for later removal.
Moreover, the Pip project has gained an uninstall
feature lately. It
records all installed files, using the record
option of the install
command.
What this PEP proposes
To address those issues, this PEP proposes a few changes:
- A new
.dist-info
structure using a directory, inspired on one format of theEggFormats
standard fromsetuptools
. - New APIs in
pkgutil
to be able to query the information of installed distributions. - An uninstall function and an uninstall script in Distutils.
One .dist-info directory per installed distribution
This PEP proposes an installation format inspired by one of the options in the
EggFormats
standard, the one that uses a distinct directory located in the
site-packages directory.
This distinct directory is named as follows:
name + '-' + version + '.dist-info'
This .dist-info
directory can contain these files:
METADATA
: contains metadata, as described in PEP 345, PEP 314 and PEP 241.RECORD
: records the list of installed filesINSTALLER
: records the name of the tool used to install the projectREQUESTED
: the presence of this file indicates that the project installation was explicitly requested (i.e., not installed as a dependency).
The METADATA, RECORD and INSTALLER files are mandatory, while REQUESTED may be missing.
This proposal will not impact Python itself because the metadata files are not used anywhere yet in the standard library besides Distutils.
It will impact the setuptools
and pip
projects but, given the fact that
they already work with a directory that contains a PKG-INFO
file, the change
will have no deep consequences.
RECORD
A RECORD
file is added inside the .dist-info
directory at installation
time when installing a source distribution using the install
command.
Notice that when installing a binary distribution created with bdist
command
or a bdist
-based command, the RECORD
file will be installed as well since
these commands use the install
command to create binary distributions.
The RECORD
file holds the list of installed files. These correspond
to the files listed by the record
option of the install
command, and will
be generated by default. This allows the implementation of an uninstallation
feature, as explained later in this PEP. The install
command also provides
an option to prevent the RECORD
file from being written and this option
should be used when creating system packages.
Third-party installation tools also should not overwrite or delete files that are not in a RECORD file without prompting or warning.
This RECORD file is inspired from PEP 262 FILES.
The RECORD
file is a CSV file, composed of records, one line per
installed file. The csv
module is used to read the file, with
these options:
- field delimiter :
,
- quoting char :
"
. - line terminator :
os.linesep
(so\r\n
or\n
)
When a distribution is installed, files can be installed under:
- the base location: path defined by the
--install-lib
option, which defaults to the site-packages directory. - the installation prefix: path defined by the
--prefix
option, which defaults tosys.prefix
. - any other path on the system.
Each record is composed of three elements:
- the file’s path
- a ‘/’-separated path, relative to the base location, if the file is under the base location.
- a ‘/’-separated path, relative to the base location, if the file is under the installation prefix AND if the base location is a subpath of the installation prefix.
- an absolute path, using the local platform separator
- a hash of the file’s contents.
Notice that
pyc
andpyo
generated files don’t have any hash because they are automatically produced frompy
files. So checking the hash of the correspondingpy
file is enough to decide if the file and its associatedpyc
orpyo
files have changed.The hash is either the empty string or the hash algorithm as named in
hashlib.algorithms_guaranteed
, followed by the equals character=
, followed by the urlsafe-base64-nopad encoding of the digest (base64.urlsafe_b64encode(digest)
with trailing=
removed). - the file’s size in bytes
The csv
module is used to generate this file, so the field separator is
“,”. Any “,” character found within a field is escaped automatically by
csv
.
When the file is read, the U
option is used so the universal newline
support (see PEP 278) is activated, avoiding any trouble
reading a file produced on a platform that uses a different new line
terminator.
Here’s an example of a RECORD file (extract):
lib/python2.6/site-packages/docutils/__init__.py,md5=nWt-Dge1eug4iAgqLS_uWg,9544
lib/python2.6/site-packages/docutils/__init__.pyc,,
lib/python2.6/site-packages/docutils/core.py,md5=X90C_JLIcC78PL74iuhPnA,66188
lib/python2.6/site-packages/docutils/core.pyc,,
lib/python2.6/site-packages/roman.py,md5=7YhfNczihNjOY0FXlupwBg,234
lib/python2.6/site-packages/roman.pyc,,
/usr/local/bin/rst2html.py,md5=g22D3amDLJP-FhBzCi7EvA,234
/usr/local/bin/rst2html.pyc,,
python2.6/site-packages/docutils-0.5.dist-info/METADATA,md5=ovJyUNzXdArGfmVyb0onyA,195
lib/python2.6/site-packages/docutils-0.5.dist-info/RECORD,,
Notice that the RECORD
file can’t contain a hash of itself and is just mentioned here
A project that installs a config.ini
file in /etc/myapp
will be added like this:
/etc/myapp/config.ini,md5=gLfd6IANquzGLhOkW4Mfgg,9544
For a windows platform, the drive letter is added for the absolute paths, so a file that is copied in c:MyAppwill be:
c:\etc\myapp\config.ini,md5=gLfd6IANquzGLhOkW4Mfgg,9544
INSTALLER
The install
command has a new option called installer
. This option
is the name of the tool used to invoke the installation. It’s a normalized
lower-case string matching [a-z0-9_\-\.]
.
$ python setup.py install –installer=pkg-system
It defaults to distutils
if not provided.
When a distribution is installed, the INSTALLER file is generated in the
.dist-info
directory with this value, to keep track of who installed the
distribution. The file is a single-line text file.
REQUESTED
Some install tools automatically detect unfulfilled dependencies and install them. In these cases, it is useful to track which distributions were installed purely as a dependency, so if their dependent distribution is later uninstalled, the user can be alerted of the orphaned dependency.
If a distribution is installed by direct user request (the usual case), a file REQUESTED is added to the .dist-info directory of the installed distribution. The REQUESTED file may be empty, or may contain a marker comment line beginning with the “#” character.
If an install tool installs a distribution automatically, as a dependency of another distribution, the REQUESTED file should not be created.
The install
command of distutils by default creates the REQUESTED
file. It accepts --requested
and --no-requested
options to explicitly
specify whether the file is created.
If a distribution that was already installed on the system as a dependency
is later installed by name, the distutils install
command will
create the REQUESTED file in the .dist-info directory of the existing
installation.
Implementation details
Note: this section is non-normative. In the end, this PEP was implemented by third-party libraries and tools, not the standard library.
New functions and classes in pkgutil
To use the .dist-info
directory content, we need to add in the standard
library a set of APIs. The best place to put these APIs is pkgutil
.
Functions
The new functions added in the pkgutil
module are :
distinfo_dirname(name, version)
-> directory namename
is converted to a standard distribution name by replacing any runs of non-alphanumeric characters with a single ‘-‘.version
is converted to a standard version string. Spaces become dots, and all other non-alphanumeric characters (except dots) become dashes, with runs of multiple dashes condensed to a single dash.Both attributes are then converted into their filename-escaped form, i.e. any ‘-’ characters are replaced with ‘_’ other than the one in ‘dist-info’ and the one separating the name from the version number.
get_distributions()
-> iterator ofDistribution
instances.Provides an iterator that looks for
.dist-info
directories insys.path
and returnsDistribution
instances for each one of them.get_distribution(name)
->Distribution
or None.obsoletes_distribution(name, version=None)
-> iterator ofDistribution
instances.Iterates over all distributions to find which distributions obsolete
name
. If aversion
is provided, it will be used to filter the results.provides_distribution(name, version=None)
-> iterator ofDistribution
instances.Iterates over all distributions to find which distributions provide
name
. If aversion
is provided, it will be used to filter the results. Scans all elements insys.path
and looks for all directories ending with.dist-info
. Returns aDistribution
corresponding to the.dist-info
directory that contains a METADATA that matchesname
for thename
metadata.This function only returns the first result founded, since no more than one values are expected. If the directory is not found, returns None.
get_file_users(path)
-> iterator ofDistribution
instances.Iterates over all distributions to find out which distributions uses
path
.path
can be a local absolute path or a relative ‘/’-separated path.A local absolute path is an absolute path in which occurrences of ‘/’ have been replaced by the system separator given by
os.sep
.
Distribution class
A new class called Distribution
is created with the path of the
.dist-info
directory provided to the constructor. It reads the metadata
contained in METADATA
when it is instantiated.
Distribution(path)
-> instance
Creates aDistribution
instance for the givenpath
.
Distribution
provides the following attributes:
name
: The name of the distribution.metadata
: ADistributionMetadata
instance loaded with the distribution’s METADATA file.requested
: A boolean that indicates whether the REQUESTED metadata file is present (in other words, whether the distribution was installed by user request).
And following methods:
get_installed_files(local=False)
-> iterator of (path, hash, size)Iterates over the
RECORD
entries and return a tuple(path, hash, size)
for each line. Iflocal
isTrue
, the path is transformed into a local absolute path. Otherwise the raw value fromRECORD
is returned.A local absolute path is an absolute path in which occurrences of ‘/’ have been replaced by the system separator given by
os.sep
.uses(path)
-> BooleanReturns
True
ifpath
is listed inRECORD
.path
can be a local absolute path or a relative ‘/’-separated path.get_distinfo_file(path, binary=False)
-> file objectReturns a file located under the.dist-info
directory.Returns a
file
instance for the file pointed bypath
.path
has to be a ‘/’-separated path relative to the.dist-info
directory or an absolute path.If
path
is an absolute path and doesn’t start with the.dist-info
directory path, aDistutilsError
is raised.If
binary
isTrue
, opens the file in read-only binary mode (rb
), otherwise opens it in read-only mode (r
).get_distinfo_files(local=False)
-> iterator of pathsIterates over the
RECORD
entries and returns paths for each line if the path is pointing to a file located in the.dist-info
directory or one of its subdirectories.If
local
isTrue
, each path is transformed into a local absolute path. Otherwise the raw value fromRECORD
is returned.
Notice that the API is organized in five classes that work with directories and Zip files (so it works with files included in Zip files, see PEP 273 for more details). These classes are described in the documentation of the prototype implementation for interested readers [9].
Examples
Let’s use some of the new APIs with our docutils
example:
>>> from pkgutil import get_distribution, get_file_users, distinfo_dirname
>>> dist = get_distribution('docutils')
>>> dist.name
'docutils'
>>> dist.metadata.version
'0.5'
>>> distinfo_dirname('docutils', '0.5')
'docutils-0.5.dist-info'
>>> distinfo_dirname('python-ldap', '2.5')
'python_ldap-2.5.dist-info'
>>> distinfo_dirname('python-ldap', '2.5 a---5')
'python_ldap-2.5.a_5.dist-info'
>>> for path, hash, size in dist.get_installed_files()::
... print '%s %s %d' % (path, hash, size)
...
python2.6/site-packages/docutils/__init__.py,b690274f621402dda63bf11ba5373bf2,9544
python2.6/site-packages/docutils/core.py,9c4b84aff68aa55f2e9bf70481b94333,66188
python2.6/site-packages/roman.py,a4b84aff68aa55f2e9bf70481b943D3,234
/usr/local/bin/rst2html.py,a4b84aff68aa55f2e9bf70481b943D3,234
python2.6/site-packages/docutils-0.5.dist-info/METADATA,6fe57de576d749536082d8e205b77748,195
python2.6/site-packages/docutils-0.5.dist-info/RECORD
>>> dist.uses('docutils/core.py')
True
>>> dist.uses('/usr/local/bin/rst2html.py')
True
>>> dist.get_distinfo_file('METADATA')
<open file at ...>
>>> dist.requested
True
New functions in Distutils
Distutils already provides a very basic way to install a distribution, which
is running the install
command over the setup.py
script of the
distribution.
Distutils2 will provide a very basic uninstall
function, that
is added in distutils2.util
and takes the name of the distribution to
uninstall as its argument. uninstall
uses the APIs described earlier and
remove all unique files, as long as their hash didn’t change. Then it removes
empty directories left behind.
uninstall
returns a list of uninstalled files:
>>> from distutils2.util import uninstall
>>> uninstall('docutils')
['/opt/local/lib/python2.6/site-packages/docutils/core.py',
...
'/opt/local/lib/python2.6/site-packages/docutils/__init__.py']
If the distribution is not found, a DistutilsUninstallError
is raised.
Filtering
To make it a reference API for third-party projects that wish to control
how uninstall
works, a second callable argument can be used. It’s
called for each file that is removed. If the callable returns True
, the
file is removed. If it returns False, it’s left alone.
Examples:
>>> def _remove_and_log(path):
... logging.info('Removing %s' % path)
... return True
...
>>> uninstall('docutils', _remove_and_log)
>>> def _dry_run(path):
... logging.info('Removing %s (dry run)' % path)
... return False
...
>>> uninstall('docutils', _dry_run)
Of course, a third-party tool can use lower-level pkgutil
APIs to
implement its own uninstall feature.
Installer marker
As explained earlier in this PEP, the install
command adds an INSTALLER
file in the .dist-info
directory with the name of the installer.
To avoid removing distributions that were installed by another packaging
system, the uninstall
function takes an extra argument installer
which
defaults to distutils2
.
When called, uninstall
controls that the INSTALLER
file matches
this argument. If not, it raises a DistutilsUninstallError
:
>>> uninstall('docutils')
Traceback (most recent call last):
...
DistutilsUninstallError: docutils was installed by 'cool-pkg-manager'
>>> uninstall('docutils', installer='cool-pkg-manager')
This allows a third-party application to use the uninstall
function
and strongly suggest that no other program remove a distribution it has
previously installed. This is useful when a third-party program that relies
on Distutils APIs does extra steps on the system at installation time,
it has to undo at uninstallation time.
Adding an Uninstall script
An uninstall
script is added in Distutils2. and is used like this:
$ python -m distutils2.uninstall projectname
Notice that script doesn’t control if the removal of a distribution breaks another distribution. Although it makes sure that all the files it removes are not used by any other distribution, by using the uninstall function.
Also note that this uninstall script pays no attention to the REQUESTED metadata; that is provided only for use by external tools to provide more advanced dependency management.
Backward compatibility and roadmap
These changes don’t introduce any compatibility problems since they will be implemented in:
- pkgutil in new functions
- distutils2
The plan is to include the functionality outlined in this PEP in pkgutil for Python 3.2, and in Distutils2.
Distutils2 will also contain a backport of the new pgkutil, and can be used for 2.4 onward.
Distributions installed using existing, pre-standardization formats do not have the necessary metadata available for the new API, and thus will be ignored. Third-party tools may of course to continue to support previous formats in addition to the new format, in order to ease the transition.
References
- [1]
- http://docs.python.org/distutils
- [2]
- http://hg.python.org/distutils2
- [3]
- http://peak.telecommunity.com/DevCenter/setuptools
- [4]
- http://peak.telecommunity.com/DevCenter/EasyInstall
- [5]
- http://pypi.python.org/pypi/pip
- [6]
- http://peak.telecommunity.com/DevCenter/EggFormats
- [7]
- http://fedoraproject.org/wiki/Packaging/Python/Eggs#Providing_Eggs_using_Setuptools
- [8]
- http://wiki.debian.org/DebianPython/NewPolicy
- [9]
- http://bitbucket.org/tarek/pep376/
Acknowledgements
Jim Fulton, Ian Bicking, Phillip Eby, Rafael Villar Burke, and many people at Pycon and Distutils-SIG.
Copyright
This document has been placed in the public domain.
Source: https://github.com/python-discord/peps/blob/main/pep-0376.txt
Last modified: 2022-01-21 11:03:51 GMT