10:25 July 16, 2011

k0s.org -> web-3.0, part 1: pyloader

Earlier this year, I moved my site to wsgiblob. The idea is that a composite (poly-application) website is an acyclic directed graph (DAG) and that using a native format for the DAG would make true traversal realizable.

wsgiblob was an exploratory project. While I put k0s.org on wsgiblob for several months, it was mostly to remind myself to finish up what I was doing. Amongst a few other things (see http://k0s.org/hg/wsgiblob ), wsgiblob contained various matching conditions for dispatching (via path, domain name, etc) and an object loader. The object loader used an .ini file, with each "app" having an :app section and an :options section (e.g.):

[hg:app]
factory = hgpaste.factory:make_app
path = /hg
[hg:options]
global_conf =
config_file = %(here)s/hgweb.config

It was a realization that while there were special things going on (e.g. path = /hg ), loading a python object was agnostic to whether or not that object was a WSGI app. I intended to rewrite this at some point.

For wsgiblob 2.0, which became wsgintegrate ( http://k0s.org/hg/wsgintegrate ), after several months of consideration I realized that this rewrite was going to be necessary. Thus was born pyloader: http://k0s.org/hg/pyloader .

pyloader actually contains several utilities for inspecting, constructing, and calling python objects. But what I will talk about now is the abstract factory for creating DAGs of arbitrary python objects that improves upon and generalizes that in wsgiblob. (See http://k0s.org/hg/pyloader/file/9203ca3a5182/pyloader/factory.py#l23 )

Like wsgiblob, the base format is JSON-serializable:

{'foo': # (arbitrary) object name,
      {'args': ['positional', 'arguments'],
       'kwargs': {'keyword': 'arguments'},
       'path': 'dotted.or.file.path:ObjectName'},
 'bar': ... } # etc

(From http://k0s.org/hg/pyloader/file/9203ca3a5182/README.txt#l12). An arbitrary name is chosen for the object (e.g. foo) and the args and kwargs associated with its construction are stored. These may also refer to other objects in the same (net) configuration via python string semantics: %(bar)s denotes the bar object, etc. So long as you have a DAG, the connectivity is arbitrary.

Additionally, you can have an .ini file that translates into a JSON file, e.g.:

[foo:dotted.or.file.path:ObjectName]
. = positional, arguments
keyword = arguments

In addition to just mirroring the JSON syntax, the .ini format allows for easy use of "decorators" as would be useful for matching a request in WSGI, though the entire thing is WSGI agnostic. For instance, a portion of my site looks like:

[:wsgintegrate.dispatcher:Dispatcher]
. = %(hg)s
    %(mozilla_hg)s
    %(bitsyblog)s
    %(wordstream)s
    %(dissociate)s
    %(anagram)s
    %(a8e)s
    %(toolbox)s
    %(k0s)s
    %(decoupage)s
# wrapper decorator
[@:wsgintegrate.match:wrap]
app = %(object)s
[hg:@:path=/hg:hgpaste:wsgi_app]
config_file = %(here)s/hgweb.config
[mozilla_hg:@:path=/mozilla/hg:hgpaste:wsgi_app]
config_file = %(here)s/hgmozilla.config
[bitsyblog:@:path=/blog:bitsyblog.factory:bitsierfactory]
global_conf =
namespace =
file_dir = %(here)s/blog
date_format = %H:%M %A, %B %-d, %Y
site_name = blog
user = k0s
header = %(here)s/templates/site-nav.html
...

The @ symbol is used as a convenient decorator. More about the syntax is available at http://k0s.org/hg/pyloader/file/9203ca3a5182/README.txt#l27

So the real question is....why do this in JSON or an .ini file versus a python file? Well, easy to read, easy to write and all of that. But much more importantly, imposing a structure -- that is restrictions -- in creating an abstract factory allows use and manipulation of said structure. A python file allows absolute freedom, but almost zero chance to be able to robustly manipulate what's going on. You can't very intelligently inspect what the python is doing: trying to do so is already black magic and it will fail in the face of other black magic, on top of being counter intuitive and, dare I say (I probably shouldn't) unpythonic. In addition, the use of python as factory configuration inevitably leads to an incestuous union of configuration and program logic...because its easy to do so. Freedom is never really free.

Contrast an .ini or JSON based approach (or XML, but...shudder). Given an .ini configuration file, you can tell when the file has been modified. Since you enforce a DAG (which covers not only WSGI traversal but a wide variety of practical python problems), you can reload the whole graph or, if you keep track of what depends on what, you can selectively reload only the modified objects. You can enable read or write access through (e.g.) a JSON request. Using this, it would be easy to make a drag+drog SVG map to construct/modify a WSGI site in real time.

I'm pretty proud of pyloader. I'd love to see something like the JSON format for object serialization be part of the standard library. The mapping of path, args, and kwargs fits well to both the JSON and .ini representation, and while the .ini format is somewhat unusual....is it really any more so than e.g. apache or nginx?

While goings are slow, since k0s.org is a side project of low priority (and time, sadly, is limited and taken up by stupid things like laundry), I look forward to improving and leveraging pyloader for k0s.org and other purposes. Stay tuned for more on wsgintegrate, built on top of pyloader.

For more details, see: