view simpypi.txt @ 82:f974c6d79804 default tip

lost and found
author Jeff Hammel <k0scist@gmail.com>
date Tue, 19 May 2015 21:02:59 -0700
parents
children
line wrap: on
line source

Internal Python Package Index for Build Infrastructure

In working on https://bugzilla.mozilla.org/show_bug.cgi?id=701506 ,
create python package webserver,
I have explored several different solutions.  In short there are
many different python packages available for doing such a thing, none
of which I found completely appropriate to our needs.

My goals for this project are:

- get the build infrastructure able to depend on python packages (vs
  having each package having to be either deployed to each slave or
  having to bundle them a la
  http://hg.mozilla.org/build/talos/file/8197dc094fe3/create_talos_zip.py
  which does not scale)

- being able to declare python dependencies in the conventional way:
  you declare ``install_requires`` as part of the ``setup`` function
  and e.g. ``python setup.py develop`` should grab the dependencies
  and versions it needs from a package index

- avoid hitting outside networks.  Tests should not fail because
  e.g. pypi.python.org is down.  We should not depend on this or any
  other networks being fast or up at all.

- being easy to upload new versions

- being easy to maintain

To this end, I wrote simpypi, as detailed in:
http://pypi.python.org/pypi/simpypi

Since it is likely that whatever is done for a first iteration will
not prove the solution we want to go with down the line, and instead
will be more of a talking point for developing a long-term solution, I
have decided to make the initial version of simpypi as simple as
possible. To this end, I made the following architecture choices:

- The simpypi GET view is just a file upload widget in a static HTML
  form. We may want to add more to this page, such as a listing of
  packages.

- simpypi currently does nothing to get packages from pypi.  Whether
  it should or not depends on our needs.

- there is no authentication.  While the simple index may be served
  within and outside of the build system without security impact, I
  have been laboring under the assumption that file upload will be
  protected by VPN. Other auth methods could also be considered

Other issues are described in the documentation:
http://k0s.org/mozilla/hg/simpypi/file/tip/README.txt

In crafting simpypi, I've realized that the simple index and the
upload mechanisms are actually uncoupled: the former serves a package index
for installation, the latter takes an upload an puts it in an
appropriate place in a directory.  This uncoupling gives significant
flexibility with respect to deployment or development.  For instance,
the simpypi piece can be swapped out as long as the simple index
directory server continues to work.

I initially (and still, to a lesser degree, continue) to investigate
https://github.com/SurveyMonkey/CheesePrism . CheesePrism is a
heavier solution (although, compared to my survey of existing python
package index solutions, it is not that heavy) that centers on taking
packages from pypi.python.org for population.  As best I can tell,
this is not really what we want from a Mozilla package server:  not
all of the packages we want or need are on pypi.python.org and the
workflow proscribed by such a solution is probably undesirable to us.
I initially hoped to add more options to CheesePrism, but bug fixes
and turnarounds have been slow.

You can see active simpypi at http://k0s.org:8080 for the upload page
and http://k0s.org:8080/index/ for the index page.  I have uploaded
mozbase and a few other packages as a demonstration.  If you want to
test, deploy a new virtualenv and run e.g.

    easy_install -i http://k0s.org:8080/index/ mozrunner

Note that the packages come from k0s.org:8080 .

You can see an active CheesePrism instance at http://k0s.org:6543/

Note that this is a POC and is intended as a talking point more than a
final solution.  A basic package index can be realized using tarballs
served with a static fileserver, but we need to have a machine to do
it on.  We should also figure out our network needs:  the package
index must be usable via the build infrastructure, but can also be
publicly available.  The web UI for uploading packages, be it simpypi
or other, should be behind a VPN.  The build infrastructure needs to
drastically begin to change to start installing dependencies from this
system vs. what we do now (which is largely work around the lack of a
package index).

We need to figure out what we want to do and drive this effort
forward.  I can work on deployment issues if we come up with a system
that I am comfortable administrating and have a computer to put it
on, though I'm not necessarily ideal for the job.  The web UI for
uploading packages should be worked through -- I give the simplest
possible model, though it can no doubt be improved.  That said, the
web UI is not necessary for serving packages now, though a computer
and a static fileserver is.