comparison README.txt @ 0:b0942f44413f

import from git://github.com/mozilla/toolbox.git
author Jeff Hammel <k0scist@gmail.com>
date Sun, 11 May 2014 09:15:35 -0700
parents
children 2ba55733b788
comparison
equal deleted inserted replaced
-1:000000000000 0:b0942f44413f
1 The Story of Toolbox
2 ====================
3
4 Toolbox is fundamentally a document-oriented approach to resource
5 indexing. A "tool" consists three mandatory string fields -- name,
6 description, and URL -- that are generic to the large class of problems
7 of web resources, as well as classifiers, such as author, usage, type,
8 etc. A tool may have an arbitrary number of classifier fields as
9 needed. Each classifier consists of a set of values with which a tool
10 is tagged. This gives toolbox the flexibility to fit a large number of
11 data models, such as PYPI, DOAP, and others.
12
13
14 Running Toolbox
15 ---------------
16
17 You can download and run the toolbox software yourself:
18 http://github.com/k0s/toolbox
19
20 To serve in baseline mode, install the software and run::
21
22 paster serve paste.ini
23
24 This will serve the handlers and static content using the paste
25 (http://pythonpaste.org) webserver using ``README.txt`` as the
26 ``/about`` page and serving the data in ``sample``.
27
28 The dispatcher (``toolbox.dispatcher:Dispatcher``) is the central (WSGI)
29 webapp that designates per-request to a number of handlers (from
30 ``handlers.py``). The dispatcher has a few options:
31
32 * about: path to a restructured text file to serve at ``/about``
33 * model_type: name of the backend to use (memory_cache, file_cache, or couch)
34 * template_dir: extra directory to look for templates
35
36 These may be configured in the ``paste.ini`` file in the
37 ``[app:toolbox]`` section by prepending with the namespace
38 ``toolbox.``. It is advisable that you copy the example ``paste.ini``
39 file for your own usage needs. Additional ``toolbox.``-namespaced
40 arguments will be passed to the model. For instance, to specify the
41 directory for the ``file_cache`` model, the provided ``paste.ini`` uses
42 ``toolbox.directory = %(here)s/sample``.
43
44
45 Architecture
46 ------------
47
48 Toolbox uses a fairly simple architecture with a single abstract data
49 model allowing an arbitrary number of implementations to be constructed::
50
51 Interfaces Implementations
52
53 +----+ +-+-----+
54 |HTTP| | |files|
55 +----+---\ +-----+ | +-----+
56 |-|model|-+-+-----+
57 +------+-/ +-----+ | |couch|
58 |script| | +-----+
59 +------+ +-+------+
60 | |memory|
61 | +------+
62 +-+---+
63 |...|
64 +---+
65
66 Toolbox was originally intended to use a directory of files, one per project,
67 as the backend. These were originally intended to be HTML files as the
68 above model may be clearly mapped as HTML::
69
70 <div class="project"><h1><a href="{{url}}">{{name}}</a></h1>
71 <p class="description">{{description}}</p>
72 {{for field in fields}}
73 <ul class="{{field}}">
74 {{for value in values[field]}}
75 <li>{{value}}</li>
76 {{endfor}}
77 {{endfor}}
78 </div>
79
80 This microformat approach allows not only easy editing of the HTML
81 documents, but the documents may be indepently served and displayed
82 without the toolbox server-side.
83
84 The HTML microformat was never implemented (though, since the model
85 backend is pluggable, it easily could be). Instead, the original
86 implementation used JSON blobs stored in one file per tool. This
87 approach loses the displayable aspect, though since JSON is a defined
88 format with several good tools for exploring and manipulating the data
89 perhaps this disavantage is offset.
90
91 A couch backend was also written.
92
93 +------------+-----------+------------+
94 |Displayable?|File-based?|Concurrency?|
95 +-----+------------+-----------+------------+
96 |HTML |Yes |Yes |No |
97 +-----+------------+-----------+------------+
98 |JSON |Not really |Yes |No |
99 +-----+------------+-----------+------------+
100 |Couch|No |No |Yes? |
101 +-----+------------+-----------+------------+
102
103 The concurrency issue with file-based documennt backends may be
104 overcome by using locked files. Ideally, this is accomplished at the
105 filesystem level. If your filesystem does not promote this
106 functionality, it may be introduced programmatically. A rough cartoon
107 of a good implementation is as follows:
108
109 1. A worker thread is spawned to write the data asynchronously. The
110 data is sent to the worker thread.
111
112 2. The worker checks for the presence of a lockfile (herein further
113 detailed). If the lockfile exists and is owned by an active process,
114 the worker waits until said process is done with it. (For a more
115 robust implementation, the worker sends a request to write the file to
116 some controller.)
117
118 3. The worker owns a lockfile based on its PID in some directory
119 parallel to the directory root under consideration (for example,
120 ``/tmp/toolbox/lock/${PID}-${filename}.lck``).
121
122 4. The worker writes to the file.
123
124 5. The worker removes the lock
125
126 The toolbox web service uses a dispatcher->handler framework. The
127 handlers are loosely pluggable (they are assigned in the dispatcher),
128 but could (and probably should) be made completely pluggable. That
129 said, the toolbox web system features an integration of templates,
130 static resources (javascript, css, images), and handlers, so true
131 pluggability is further away than just supporting pluggable handlers
132 in the dispatcher.
133
134 Deployment, however, may be tailored as desired. Any of the given
135 templates may be overridden via passing a ``template_dir`` parameter
136 with a path to a directory that have templates of the appropriate
137 names as found in toolbox's ``templates`` directory.
138
139 Likewise, the static files (css, js, etc.) are served using ``paste``'s
140 ``StaticURLParser`` out of toolbox's ``static`` directory. (See
141 toolbox's ``factory.py``.) Notably this is *not* done using the WSGI
142 app itself. Doing it with middleware allows the deployment to be
143 customizable by writing your own factory. For example, instead of
144 using the ``paste`` webserver and the included ``paste.ini``, you
145 could use nginx or apache and ``mod_wsgi`` with a factory file
146 invoking ``Dispatcher`` with the desired arguments and serving the
147 static files with an arbitrary static file server.
148
149 It is common sense, if rarely followed, that deployment should be
150 simple. If you want to get toolbox running on your desktop and/or for
151 testing, you should be able to do this easily (see the ``INSTALL.sh``
152 for a simple installation using ``bash``; you'll probably want to
153 perform these steps by hand for any sort of real-world deployment).
154 If you want a highly customized deployment, then this will require
155 more expertise and manual setup.
156
157 The template data and the JSON are closely tied together. This has the
158 distinct advantage of avoiding data translation steps and avoiding
159 code duplication.
160
161 Toolbox uses several light-footprint libraries:
162
163 * webob for Request/Response handling: http://pythonpaste.org/webob/
164
165 * tempita for (HTML) templates: http://pythonpaste.org/tempita/
166
167 * whoosh for search. This pure-python implementation of full-text
168 search is relatively fast (for python) and should scale decently to
169 the target scale of toolbox (1000s or 10000s of tools). While not as
170 fast as lucene, whoosh is easy to deploy and has a good API and
171 preserves toolbox as a deployable software product versus an
172 instance that requires the expert configuration, maintainence, and
173 tuning of several disparate software products that is both
174 non-automatable (cannot be installed with a script) and
175 time-consuming. http://packages.python.org/Whoosh/
176
177 * jQuery: jQuery is the best JavaScript library and everyone
178 should use it. http://jquery.com/
179
180 * jeditable for AJAXy editing: http://www.appelsiini.net/projects/jeditable
181
182 * jquery-token for autocomplete: http://loopj.com/jquery-tokeninput/
183
184 * less for dynamic stylesheets: http://lesscss.org/
185
186
187 User Interaction
188 ----------------
189
190 A user will typically interact with Toolbox through the AJAX web
191 interface. The server side returns relatively simple (HTML) markup,
192 but structured in such a way that JavaScript may be utilized to
193 promote rich interaction. The simple HTML + complex JS manifests
194 several things:
195
196 1. The document is a document. The tools HTML presented to the user (with
197 the current objectionable exception of the per-project Delete button)
198 is a document form of the data. It can be clearly and easily
199 translated to data (for e.g. import/export) or simply marked up using
200 (e.g.) JS to add functionality. By keeping concerns seperate
201 (presentation layer vs. interaction layer) a self-evident clarity is
202 maintained.
203
204 2. Computation is shifted client-side. Often, an otherwise lightweight
205 webapp loses considerable performance rendering complex templates. By
206 keeping the templates light-weight and doing control presentation and
207 handling in JS, high performance is preserved.
208
209
210 What Toolbox Doesn't Do
211 -----------------------
212
213 * versioning: toolbox exposes editing towards a canonical document.
214 It doesn't do versioning. A model instance may do whatever
215 versioning it desires, and since the models are pluggable, it would
216 be relatively painless to subclass e.g. the file-based model and
217 have a post-save hook which does an e.g. ``hg commit``. Customized
218 templates could be used to display this information.
219
220 * authentication: the information presented by toolbox is freely
221 readable and editable. This is by intention, as by going to a "wiki"
222 model and presenting a easy to use, context-switching-free interface
223 curation is encouraged (ignoring the possibly imaginary problem of
224 wiki-spam). Access-level auth could be implemented using WSGI
225 middleware (e.g. repoze.who or bitsyauth) or through a front end
226 "webserver" integration layer such as Apache or nginx. Finer grained
227 control of the presentation layer could be realized by using custom
228 templates.
229
230
231 What Toolbox Would Like To Do
232 -----------------------------
233
234 Ultimately, toolbox should be as federated as possible. The basic
235 architecture of toolbox as a web service + supporting scripts makes
236 this feasible and more self-contained than most proposed federated
237 services. The basic federated model has proved, in practice,
238 difficult to achieve through purely the (HTTP) client-server model, as
239 without complete federation and adherence to protocol offline cron
240 jobs should be utilized to pull external data sources. If a webservice
241 only desires to talk to others of its own type and are willing to keep
242 a queue of requests for when hosts are offline, entire HTTP federation
243 may be implemented with only a configuration-specified discovery
244 service to find the nodes.
245
246
247 Evolution
248 ---------
249
250 Often, a piece software is presented as a state out of context (that
251 is minus the evolution which led it to be and led it to look further
252 out towards beyond the horizon). While this is an interesting special
253 effect for an art project, software being communication this
254 is only conducive to software in the darkest of black-box approaches.
255
256 "Beers are like web frameworks: if they're not micro, you don't know
257 what you're talking about." - hipsterhacker
258
259 For sites that fit the architecture of a given framework, it may be
260 advisable to make use of them. However, for most webapp/webservice
261 categories which have a finite scope and definitive intent, it is
262 often easier, more maintainable, and more legible to build a complete
263 HTTP->WSGI->app architecture than to try to hammer a framework into
264 fitting your problem or redefining the problem to fit the framework.
265 This approach was used for toolbox.
266
267 The GenshiView template, http://k0s.org/hg/GenshiView, was invoked to
268 generate a basic dispatcher->handler system. The cruft was removed,
269 leaving only the basic structure and the TempitaHandler since tempita
270 is lightweight and it was envisioned that filesystem tempita templates
271 (MakeItSo!) would be used elsewhere in the project. The basic
272 handlers (projects views, field-sorted view, new, etc.) were written
273 and soon a usable interface was constructed.
274
275 A ``sample`` directory was created to hold the JSON blobs. Because
276 this was done early on, two goals were achieved:
277
278 1. the software could be dogfooded immediately using actual applicable
279 data. This helped expose a number of issues concerning the data format
280 right away.
281
282 2. There was a place to put tools before the project reached a
283 deployable state (previously, a few had lived in a static state using
284 a rough sketch of the HTML microformat discussed above on
285 k0s.org). Since the main point of toolbox is to record Mozilla tools,
286 the wealth of references mentioned in passing could be put somewhere,
287 instead of passed by and forgotten. One wishes that they do not miss
288 the train while purchasing a ticket.
289
290 The original intent, when the file-based JSON blob approach was to be
291 the deployed backend, was to have two repositories: one for the code
292 and one for the JSON blobs. When this approach was scrapped, the
293 file-based JSON blobs were relegated to the ``sample`` directory, with
294 the intent to be to import them into e.g. a couch database on actual
295 deployment (using an import script). The samples could then be used
296 for testing.
297
298 The model has a single "setter" function, ``def update``, used for
299 both creating and updating projects. Due to this and due to the fact
300 the model was ABC/pluggable from the beginning, a converter ``export``
301 function could be trivially written at the ABC-level::
302
303 def export(self, other):
304 """export the current model to another model instance"""
305 for project in self.get():
306 other.update(project)
307
308 This with an accompanying CLI utility was used to migrate from JSON
309 blob files in the ``sample`` directory to the couch instance. This
310 particular methodology as applied to an unexpected problem (the
311 unanticipated switch from JSON blobs to couch) is a good example of
312 the power of using a problem to drive the software forward (in this
313 case, creation of a universal export function and associated command
314 line utility). The alternative, a one-off manual data migration, would
315 have been just as time consuming, would not be repeatable, would not
316 have extended toolbox, and may have (like many one-offs do) infected
317 the code base with associated semi-permanant vestiges. In general,
318 problems should be used to drive innovation. This can only be done if
319 the software is kept in a reasonably good state. Otherwise
320 considerable (though probably worthwhile) refactoring should be done
321 prior to feature extension which will become cost-prohibitive in
322 time-critical situations where a one-off is (more) likely to be employed.
323
324
325 Use Cases
326 ---------
327
328 The target use-case is software tools for Mozilla, or, more generally,
329 a software index. For this case, the default fields uses are given in
330 the paste.ini file: usage, author, type, language. More fields may be
331 added to the running instance in the future.
332
333 However, the classifier classification can be used for a wide variety
334 of web-locatable resources. A few examples:
335
336 * songs: artist, album, genre, instruments
337 * de.li.cio.us: type, media, author, site
338
339
340 Resources
341 ---------
342
343 * http://readthedocs.org/