Python Packaging

python is often criticized for it's module loading procedure and, associatedly, it's packaging mechanism (indeed, these issues are often conflated). While there are several moving parts, and not one obvious way to do it , python's module importing, at least as far as what generally gets inserted in sys.path, is fairly easy to understand, and while the installation story is a bit more confused, it is fairly easy to package and install python software in a manner appropriate to your deployment story. This document attempts to present the basics of where python looks for modules and various installatinos options in a thorough yet actionable way. In my opinion, once you understand the basics, python has several good installation models, and I also find that most people that criticize python for doing this-or-that are usually the ones that have given up before actually understand what it does.

sys.path

python looks in sys.path in order to determine what modules to load. While the order may be changed, the usual case is to look for module imports in the following order:

  1. in the current working directory
  2. in PYTHONPATH
  3. in site packages loaded by .pth files

You can also alter sys.path programmatically, but this in general is not a good packaging solution.

>

PYTHONPATH

PYTHONPATH is an environment variable which specifies an ordered list of directories to look for imports:

> echo "print 'how are you gentlemen?!?'" >> /tmp/allyourbase.py
> PYTHONPATH=/tmp python -c "import allyourbase"
how are you gentlemen?!?
        

Subdirectories relative to each element of PYTHONPATH are also recursed into if they contain an __init__.py file:

> mkdir /tmp/foo
> echo "import bar" >> /tmp/foo/__init__.py
> echo "print 'i am a tomato'" >> /tmp/foo/bar.py
> PYTHONPATH=/tmp python -c "import foo"
i am a tomato
        

Example of sys.path differences:

echo $PYTHONPATH; diff <(PYTHONPATH=/tmp python -c "import sys; print '\n'.join(sys.path)") <(python -c "import sys; print '\n'.join(sys.path)")
/home/jhammel/python:
2c2,3
< /tmp
---
> /home/jhammel/python
> /home/jhammel
        

.pth files

site.py searches it's distribution directory for .pth files and adds packages that it finds to sys.path. site.py and the associated .pth files are responsible for loading most packaged code (that is, installed python packages that aren't part of python's standard library).

site.py is imported automatically upon python initialization (unless the -S switch is passed). site.py is searched for in lib/python<version>/site.py relative to sys.prefix and sys.exec_prefix (if different). While any code can be put in site.py, generally this module tells python to load the packages in sys.prefix + 'lib/python<version>/site-packages' and sys.prefix + 'lib/python<version>/dist-packages' or maybe a paths depending on the OS vendor.

python looks for its lib directory and site.py relative to PYTHONHOME if it is set, then relative to the path to the python binary. virtualenv relies on the latter behaviour. Since PYTHONHOME is coded in python at the C level, virtualenv will not function correctly with PYTHONHOME set.

Example of a .pth file:

> cat python-support.pth
/usr/lib/pymodules/python2.6
gtk-2.0
/usr/lib/pymodules/python2.6/gtk-2.0
        
The specified directories are added to sys.path in order. If a line is not an absolute path, it is relative to the location of the .pth file.

PYTHONHOME

The PYTHONHOME environment variable controls where python looks for its lib and site.py (and therefore site-packages) as well as it's include and bin directories. So unlike PYTHONPATH, which is only consumed by sys.path, PYTHONHOME is also respected when installing software by any of the installers. So it is much more pervasive.

PYTHONHOME is respected at the the C level in the (standard) CPython implementation, so it is a high level override to the standard behaviour. Typically, PYTHONHOME is used to maintain parallel python installations in an alternate way to virtualenv.

Example of PYTHONHOME:

> python -c "import sys; print sys.path"
['', '/home/jhammel/python', '/home/jhammel/mozilla/src/mozilla-central', '/usr/lib/python2.6', '/usr/lib/python2.6/plat-linux2', '/usr/lib/python2.6/lib-tk', '/usr/lib/python2.6/lib-old', '/usr/lib/python2.6/lib-dynload', '/usr/lib/python2.6/dist-packages', '/usr/lib/python2.6/dist-packages/PIL', '/usr/lib/python2.6/dist-packages/gst-0.10', '/usr/lib/pymodules/python2.6', '/usr/lib/python2.6/dist-packages/gtk-2.0', '/usr/lib/pymodules/python2.6/gtk-2.0', '/usr/lib/python2.6/dist-packages/wx-2.8-gtk2-unicode', '/usr/local/lib/python2.6/dist-packages']
> PYTHONHOME=/tmp python -c "import sys; print sys.path"
'import site' failed; use -v for traceback
['', '/home/jhammel/python', '', '/tmp/lib/python2.6/', '/tmp/lib/python2.6/plat-linux2', '/tmp/lib/python2.6/lib-tk', '/tmp/lib/python2.6/lib-old', '/tmp/lib/python2.6/lib-dynload']
        
The null string in sys.path represents the current working directory. Note that, because the standard site.py isn't found with respect to the location of PYTHONHOME when it is set, the paths it would load are not included in sys.path in that case.

Reference

Python Module Installers

Several packages provide functionality to install python packages. The basic functionality of an installer is:

All the standard python installers make use of a setup.py file which calls a setup() function providing the package metadata and detailing what should be installed. setup() installs the package(s). The standard way of installing python packages is, in the package's directory, running python setup.py install. The package's python files will be copied to python's lib directory from which they will be importable. By convention, python files for the package are kept in subdirectories relative to setup.py. If other initialization or checks are required for installation of the python package, these may (and should) be done in setup.py as well.

distutils

distutils is the only installation module that is part of the python standard library. distutils is a basic packaging system. It does not provide for dependencies or web installation or the like. The main reason to use distutils is that it will be present on platforms running python. Documentation on the distutils setup function may be read via python -c 'from distutils.core import setup; help(setup)'

setuptools

setuptools improves over distutils in a number of ways:

However, setuptools is not part of the standard library (and never will be). Via-net automagic use is available using ez_setup.py.

easy_install

easy_install is a widely-used program for installing python software over the web. Instead of the traditional way of navigating to the package on pypi, downloading the software (and dependencies, if any), unpacking, and running python setup.py install, easy_install allows package installation from the command line: easy_install <package-name>. This will install <package-name> and its dependencies to python's lib location. easy_install will look to pypi for its package index by default, but this may be overridden by specifying an index with the -i switch.

Deficiencies of setuptools

  • setuptools may not be installed: despite the reality that setuptools is the de facto standard for python packaging, it may not be present on a user's system. While ez_setup.py may be used to ensure correct behaviour, this methodology is not only somewhat invasive, it also depends on python setup.py being run on a computer with net access. [Note: I personally do not use ez_setup.py]
  • often ill-maintained: while right now (May, 2010, setuptools seems to work more or less correctly, setuptools (particularly the web package retrieval portions) have historical often been brittle (for instance, when SVN 1.5 was released).
  • difficult to debug: when there is a problem with package installation, it is often quite hard and cryptic to figure out what exactly went wrong since setuptools does many things behind the scenes. pip helps with this (but is neither part of the standard library nor setuptools but yet another third party package).
  • find_links do not recurse; this means that the parent package has to know about the child's dependencies

distribute

distribute is a fork of setuptools. Due to lack of maintainence, setuptools was forked. distribute started in order to have a maintained and forward-facing package management solution for python. While setuptools is still more commonly used, distribute is likely to be included in a future release of python and has received Guido's blessing. distribute behaves otherwise like setuptools but aimed at compatability and streamlining going forward.

Example package

Directory Structure

.
|-- foo
|   |-- bar.py
|   `-- __init__.py
`-- setup.py

          

[Note: I removed the setup.cfg and the egg-info that PasteScript creates. The former is unnecessary and the latter should not be versioned.]

Importing foo

Once it is installed (via python setup.py install), you can import foo in the usual way independent of your current directory (as its location is provided by pth files automatically loaded by site.py):

import foo
import foo.bar
from foo import bar
          

setup.py

The setup.py is a bit more complicated than it needs to be since it was created using a template. But it works and all the basic pieces are there:
from setuptools import setup, find_packages
import sys, os

version = '0.0'

setup(name='foo',
      version=version,
      description="the foo package",
      long_description="""\
""",
      classifiers=[], # Get strings from http://pypi.python.org/pypi?%3Aaction=list_classifiers
      keywords='',
      author='Jeff Hammel',
      author_email='jhammel@example.com',
      url='',
      license='MPL',
      packages=find_packages(exclude=['ez_setup', 'examples', 'tests']),
      include_package_data=True,
      zip_safe=False,
      install_requires=[
          # -*- Extra requirements: -*-
      ],
      entry_points="""
      # -*- Entry points: -*-
      """,
      )
          
This is a setuptools setup.py file, though a distutils or distribute setup.py would look similar.

Creating the foo package

> paster create foo
Selected and implied templates:
  PasteScript#basic_package  A basic setuptools-enabled package

Variables:
  egg:      foo
  package:  foo
  project:  foo
Enter version (Version (like 0.1)) ['']:   
Enter description (One-line description of the package) ['']: the foo package
Enter long_description (Multi-line description (in reST)) ['']: 
Enter keywords (Space-separated keywords/tags) ['']: 
Enter author (Author name) ['']: Jeff Hammel
Enter author_email (Author email) ['']: jhammel@example.com
Enter url (URL of homepage) ['']: 
Enter license_name (License name) ['']: MPL
Enter zip_safe (True/False: if the package can be distributed as a .zip file) [False]: 
Creating template basic_package
Creating directory ./foo
  Recursing into +package+
    Creating ./foo/foo/
    Copying __init__.py to ./foo/foo/__init__.py
  Copying setup.cfg to ./foo/setup.cfg
  Copying setup.py_tmpl to ./foo/setup.py
Running /home/jhammel/stage/bin/python setup.py egg_info
> echo 'print "hello world"' > foo/foo/bar.py
> cd foo; python setup.py develop > /dev/null # install in place
running develop
running egg_info
writing foo.egg-info/PKG-INFO
writing top-level names to foo.egg-info/top_level.txt
writing dependency_links to foo.egg-info/dependency_links.txt
writing entry points to foo.egg-info/entry_points.txt
reading manifest file 'foo.egg-info/SOURCES.txt'
writing manifest file 'foo.egg-info/SOURCES.txt'
running build_ext
Creating /home/jhammel/stage/lib/python2.6/site-packages/foo.egg-link (link to .)
foo 0.0dev is already the active version in easy-install.pth

Installed /home/jhammel/stage/foo
Processing dependencies for foo==0.0dev
Finished processing dependencies for foo==0.0dev
> python -c 'from foo import bar' # your directory location is unimportant
          
If you want to repeat this experiment, I recommend using virtualenv so as to not pollute your global site packages.

References

virtualenv

virtualenv is a virtual python implementation. This means, a virtualenv provides a separate environment for the installation of python packages. This is a big boon for python development, as it allows easy installation of various pieces of python code without modification of the system site-packages.

On running virtualenv <directory>, activation scripts are created in <directory>/bin. These scripts change the PATH to add this bin directory and provide a deactivate() function. However, giving the full path to executables in $VIRTUAL_ENV/bin will work without having to use the activate scripts (unless PYTHONHOME is set, in which case the scripts must be run or PYTHONHOME otherwise unset before using the virtualenv). Since the activate scripts set environment variables, they must be sourced in bash rather than open in a subshell: . <directory>/bin/activate.

virtualenv works by copying the system binary (the one used to invoke virtualenv, unless otherwise specified) to the bin subdirectory of the given target as well as symlinking the standard library to the lib/python<version> subdirectory. By default, the system site-packages will be included as well, but this may be overridden with the --no-site-packages switch. Because of the way python looks for its lib directory relative to the binary location, the virtual environment may be used to install and load packages without impacting system python. Since PYTHONHOME is examined before resolving relative to the binary, a virtualenv will not work when PYTHONHOME is set. Unsetting PYTHONHOME or running the activate scripts must be utilized before attempting to use the virtualenv's executables.

Example usage of virtualenv:

> virtualenv.py foo
New python executable in foo/bin/python
Installing setuptools............done.
> tree foo -L 2
foo
|-- bin
|   |-- activate
|   |-- activate_this.py
|   |-- easy_install
|   |-- easy_install-2.6
|   |-- pip
|   `-- python
|-- include
|   `-- python2.6 -> /usr/include/python2.6
`-- lib
    `-- python2.6

> foo/bin/python -c "import sys; print sys.prefix; print sys.path"
/home/jhammel/foo
['', '/home/jhammel/foo/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg', '/home/jhammel/foo/lib/python2.6/site-packages/pip-0.7.1-py2.6.egg', '/home/jhammel/python', '/home/jhammel', '/home/jhammel/foo/lib/python2.6', '/home/jhammel/foo/lib/python2.6/plat-linux2', '/home/jhammel/foo/lib/python2.6/lib-tk', '/home/jhammel/foo/lib/python2.6/lib-old', '/home/jhammel/foo/lib/python2.6/lib-dynload', '/usr/lib/python2.6', '/usr/lib/python2.6/plat-linux2', '/usr/lib/python2.6/lib-tk', '/home/jhammel/foo/lib/python2.6/site-packages', '/usr/local/lib/python2.6/site-packages', '/usr/local/lib/python2.6/dist-packages', '/usr/lib/python2.6/dist-packages', '/usr/lib/python2.6/dist-packages/PIL', '/usr/lib/python2.6/dist-packages/gst-0.10', '/usr/lib/pymodules/python2.6', '/usr/lib/python2.6/dist-packages/gtk-2.0', '/usr/lib/pymodules/python2.6/gtk-2.0', '/usr/lib/python2.6/dist-packages/wx-2.8-gtk2-unicode']
        

virtualenv comes bundled with setuptools, distribute, and pip (and of course has distutils since that is part of the standard library).

virtualenv.py may be used as a single file to create a virtual environment:

curl http://bitbucket.org/ianb/virtualenv/ | python - myenvironment
        
When used in this way, virtualenv.py will download setuptools, etc., from the network. If the whole virtualenv directory is downloaded, even if it is not installed, virtualenv.py will not touch net.

pip

pip is a python installer. It is compatible with setuptools and provides additional functionality. pip is intended as a more intelligent version of easy_install. Like easy_install, pip looks by default to http://pypi.python.org/simple as its package index, but this may be overridden with the pip install -i switch.

In addition to being a standalone package, pip (like setuptools and distribute) comes bundled with virtualenv.

References

Takeaways

When deciding how to treat an intra-dependent set of .py files, especially with respect to deployment, there are several options:

Package them

That is to say, add a setup.py appropriate to one (or more) of the installers whose function is to install your software to site-packages (either the system's, or a virtualenv's, or a location specified by PYTHONHOME or with a prefix argument dictated to your installer). This has the advantage that modules in your package will be importable within python's sys.path and that console scripts and other components setup by setup.py will be installed correctly.

There are several pieces of software that make this easy. For example, the basic_package paste template will create a package skeleton including a setup.py file where you can put your code:

paster create MyPackage; cp mypythonfiles/* MyPackage/mypackage
        

paster will ask for various bits of metadata concerning the package (these can be saved in a config file for ease of reuse). See a more verbose example if desired.

The normal case is to have python files in a subdirectory with respect to setup.py (since setup.py is not part of your package as it shouldn't be installed). This creates a directory structure one level different than unpackaged python, so for version control systems, going from unpackaged to packaged is a radical rearrangement of the tree. This does not have to be the case. The py_modules keyword argument to setup() may be used to specify modules relative to the location of setup.py.

As Ian Bicking correctly pointed out, for single files, there is considerable overhead in packaging with any of the existing python packaging solutions. However, with this overhead comes the ability for your package to be shared on pypi, to be used as a dependency for other packages, and to be installed in a usual way. [Note: while perhaps python is below average in this respect, it is hardly unique. For example, most version control systems demand a directory structure instead of allowing versioning of a single file.

Put them in the same directory

.py files will look for imports relative to their location on the filesystem:

> echo 'import bar' > /tmp/foo.py
> for i in foo bar; do echo "print '$i'" >> /tmp/$i.py; done
> python /tmp/foo.py 
bar
foo
        

Keeping all your .py files in the same directory allows these relative imports from any of them. This is the simplest solution and also the least appropriate for complex software. To give a few examples as to why

In addition, putting all your .py files in the same directory tends towards vague and ambiguous software whose purpose becomes "stuff I need to do". While people care to differing degrees on writing modular and reusable code, it is generally agreed (and observed in practice) having very specific code becomes brittle and difficult to maintain. Software is a manifestation of intent, and without asking the question "what do I really want to do?", software tends towards the perceived need at the time. Since solutions to this problem tend to be very specific, and since needs evolve in time, this approach not only demands often and pervasive rewrites, since each of these rewrites tend to take the least short-term effort, the code quickly becomes spaghetti. Before solving any problem, software or otherwise, ask yourself: What do I really want to do? [Note: This is not a defense of unnecessary abstraction or an overly top-down approach. Like any generalization, going too far the other way is also bad. You can meditate indefinitely on what the real problem is without writing a single line of code, and instead of spending all your time maintaining unmaintainable code, you've spent all your time not writing anything. There must be a balance.

Fix PYTHONPATH

For the case where you want multiple directories (for purposes of this section, "packages"), but don't want to have a setup.py file, you may run uninstalled separate directories out of the same root directory as long as you set PYTHONPATH to point to this root directory

Example

Let's say you have packages foo and bar that you want to run unpackaged (which is to say, having the .py files and associated resources in directories named foo and bar). If they are both subdirectories of /path/to/my/python, you can write a shell script that will set the python:

#!/bin/bash
PYTHONPATH=/path/to/my/python:$PYTHONPATH
exec $@ # the argument to the script is the command to run
          

If you don't know the absolute location of /path/to/my/python, you can put a more clever script at the parent directory of foo and bar

#!/bin/bash
cd dirname $0
PYTHONPATH=$(pwd):$PYTHONPATH
cd -
exec $@
          

Another approach is to have a script that is sourced, rather than having the script execute the desired python:

path_to_this=$(history | tail -1 | awk '{print $3}')
directory=$(dirname $path_to_this)

# bash won't tilde-expand in a variable, so do it manually (optional)
directory=${directory/'~'/$HOME}

cd $directory
directory=$(pwd)
cd - > /dev/null
PYTHONPATH=${directory}:$PYTHONPATH
          

The script is then used like

. /path/to/activate && command-you-want --and arguments
          

The more complicated example is used where the absolute path of the script is unknown. In the case where ${directory} is known, a one-liner PYTHONPATH=${directory}:PYTHONPATH suffices. You could conceivably fix up $PATH in this script too, if desired. Note that this method duplicates the intent of virtualenv's activate scripts.

In this approach, you must provide an __init__.py file in the directory with your python files (that is, in all the subdirectories of PYTHONPATH that you wish to import from). If this is not found, python will not recurse into these directories to locate importable modules. __init__.py may be a blank file.

This approach has the same disadvantage as putting all files in the same directory, but at least you can separate files by common intent and functionality.

Hybrid Approach

For various reasons, it may be desirable or it may be perceived desirable to run uninstalled python as part of a deployment. If it is also desirable to package and share the code, it is possible to have the same code-base usable packaged and unpackaged.

The usual way of doing installable packages is to have python code in a subdirectory relative to setup.py. The subdirectory name is (again, by convention), the top-level namespace for the package. In this case, when you want to deploy uninstalled, you may take only this subdirectory of the package for your installation, fixing PYTHONPATH if necessary (e.g. if you have multiple packages or need to import these modules when your current working directory is other than the location of this subdirectory).

When taking a hybrid approach, there are several nuances you will want to take to mind:

The type of packaging you decide on depends on a number of factors:

Use virtualenv

For any serious python developer, I advocate putting as little as possible/practical in the system site-packages directory. Since the usual case is to be experimenting with several different versions of several different (often inter-dependent) code bases, using a single site-package base does not allow the necessary isolation of code to transparently develop and test software. While it is a deficiency of python (or at least it's existing installers) to allow more than one version of a package in any given site-packages, it is not the normal case of software development in any language to install software and libraries globally during the development process. virtualenv allows an easy solution to this problem. Other strategies may instead be used, but in my experience, these mostly go some way towards reinventing virtualenv.

Python Packages vs. the System Packages

There is an ongoing debate on whether to use the system packaging solution (e.g. apt for debian and ubuntu) or python's solution. While I won't attempt to give a concrete solution to a debate in which there are many gray areas, I will give my approach.

For software that is part of the system, I use the system installer. If I'm using, say, synaptic to install software, it depends on the needed python being installed in the system way. No need to fight this. I also occasionally do this for packages that depend on C libraries, such as python-ldap or lxml, as I want a stable dependable installation. If any of my software depended on, say, a development version of lxml, I probably wouldn't do this.

Other than that, I don't put anything in the system site packages. Other software is almost always software I want to experiment with or develop on. In these cases, I don't want to pollute the system's site packages with my works in progress. So I use virtualenv.

This works well for me, being a software developer. If I was purely on the user side (that is, I did not want to ever edit any python files and did not demand or care about encapsulation of sotware I used), I would probably be more lax about putting software into the system site-packages

For servers or shared development boxes, it may be more appropriate to put needed shared packages in the global site-packages so that all users will be using these (presumedly known good) pieces of software. For instance, if a deployment strategy depends on virtualenv, then it is probably a good idea to install this globally.

Single File Packages

It may be desirable to have a single file that is also a python package. This need usually arises when you are dealing with multiple design constraints:

  1. You are dealing with a module which may be depended on by other python packages (or depend on them)
  2. Your module must be in a single file

Constraint 2 normally occurs as a constraint on deployment, as it is easier, clearer, or otherwise important to move a single file around versus an entire module.

This can be done.

However, you must be aware that there is an additional constraint due to (e.g.) setuptools: in order for a package to be easy_install-able, it must have the installer be named setup.py.

Note that I'm not imposing the constraint that all auxilliaries must live in the same file. It is assumed that, as part of good development practices, the canonical version of the file lives in version control, and along side the module of interest may live a README, tests, and other supporting files. The requirement is that the module be able to install itself into site-packages. In this case, the solution is to maintain an auxilliary setup.py that invokes the module of interest.

Example: ManifestDestiny

As an example, consider ManifestDestiny: http://hg.mozilla.org/automation/ManifestDestiny . The mercurial repository contains tests, a README, etc, but the entireity of the code logic lives in manifeparser.py . The reason that it is desirable to have manifestparser as a single file is because it needs to be synchronized with a copy in mozilla-central (the code that makes Firefox ) for the various test harnesses to consume, but the ManifestDestiny repository is the canonical location.

It also has to be a python package that works with setuptools and live on pypi.python.org as it is dependent on by another setuptools package, Mozmill . (Mozmill is also synced to mozilla-central, but because of how existing test harnesses modify PYTHONPATH to import modules, versus putting them in site-packages, it is more convenient and sensible to deploy manifestparser to mozilla-central as a single file.)

As said before, this can be done. manifestparser works on a command syntax:

 manifestparser [options] [command] [command-arguments]
        

where command is one of several handlers. So we add a new handler, SetupCLI, and put the setup code there:

class SetupCLI(CLICommand):
    """
    setup using setuptools
    """
    usage = '%prog [options] setup [setuptools options]'

    def __call__(self, options, args):
        sys.argv = [sys.argv[0]] + args
        assert setup is not None, "You must have setuptools installed to use SetupCLI"
        here = os.path.dirname(os.path.abspath(__file__))
        try:
            filename = os.path.join(here, 'README.txt')
            description = file(filename).read()
        except:
            description = ''
        os.chdir(here)

        setup(name='ManifestDestiny',
              version=version,
              description="universal reader for manifests",
              long_description=description,
              classifiers=[], # Get strings from http://pypi.python.org/pypi?%3Aaction=list_classifiers
              keywords='mozilla manifests',
              author='Jeff Hammel',
              author_email='jhammel@mozilla.com',
              url='https://wiki.mozilla.org/Auto-tools/Projects/ManifestDestiny',
              license='MPL',
              zip_safe=False,
              py_modules=['manifestparser'],
              install_requires=[
                  # -*- Extra requirements: -*-
                  ],
              entry_points="""
              [console_scripts]
              manifestparser = manifestparser:main
              """,
              )
        

The various attributes and function signatures are because of the CLI API in manifestparser.py (see source for details). setuptools is imported conditionally at the top of the file so that also is not a requirement. The result: you can run manifestparser.py setup develop with the same effect for a typical package you would do python setup.py develop.

This isn't the entirety of the story however. Uploading this package to pypi (e.g. using python manifestparser.py egg_info -RDb "" sdist register upload' ) will not result in a viable package! You can download the resultant tarball, unzip it and install it in the usual way, fine. But for an upstream package -- that is, Mozmill -- that depends on ManifestDestiny, the package will be downloaded successfully but it will not successfully install because (e.g.) easy_install will complain that a setup.py is not found