Mercurial > hg > toolbox
comparison README.txt @ 0:b0942f44413f
import from git://github.com/mozilla/toolbox.git
author | Jeff Hammel <k0scist@gmail.com> |
---|---|
date | Sun, 11 May 2014 09:15:35 -0700 |
parents | |
children | 2ba55733b788 |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:b0942f44413f |
---|---|
1 The Story of Toolbox | |
2 ==================== | |
3 | |
4 Toolbox is fundamentally a document-oriented approach to resource | |
5 indexing. A "tool" consists three mandatory string fields -- name, | |
6 description, and URL -- that are generic to the large class of problems | |
7 of web resources, as well as classifiers, such as author, usage, type, | |
8 etc. A tool may have an arbitrary number of classifier fields as | |
9 needed. Each classifier consists of a set of values with which a tool | |
10 is tagged. This gives toolbox the flexibility to fit a large number of | |
11 data models, such as PYPI, DOAP, and others. | |
12 | |
13 | |
14 Running Toolbox | |
15 --------------- | |
16 | |
17 You can download and run the toolbox software yourself: | |
18 http://github.com/k0s/toolbox | |
19 | |
20 To serve in baseline mode, install the software and run:: | |
21 | |
22 paster serve paste.ini | |
23 | |
24 This will serve the handlers and static content using the paste | |
25 (http://pythonpaste.org) webserver using ``README.txt`` as the | |
26 ``/about`` page and serving the data in ``sample``. | |
27 | |
28 The dispatcher (``toolbox.dispatcher:Dispatcher``) is the central (WSGI) | |
29 webapp that designates per-request to a number of handlers (from | |
30 ``handlers.py``). The dispatcher has a few options: | |
31 | |
32 * about: path to a restructured text file to serve at ``/about`` | |
33 * model_type: name of the backend to use (memory_cache, file_cache, or couch) | |
34 * template_dir: extra directory to look for templates | |
35 | |
36 These may be configured in the ``paste.ini`` file in the | |
37 ``[app:toolbox]`` section by prepending with the namespace | |
38 ``toolbox.``. It is advisable that you copy the example ``paste.ini`` | |
39 file for your own usage needs. Additional ``toolbox.``-namespaced | |
40 arguments will be passed to the model. For instance, to specify the | |
41 directory for the ``file_cache`` model, the provided ``paste.ini`` uses | |
42 ``toolbox.directory = %(here)s/sample``. | |
43 | |
44 | |
45 Architecture | |
46 ------------ | |
47 | |
48 Toolbox uses a fairly simple architecture with a single abstract data | |
49 model allowing an arbitrary number of implementations to be constructed:: | |
50 | |
51 Interfaces Implementations | |
52 | |
53 +----+ +-+-----+ | |
54 |HTTP| | |files| | |
55 +----+---\ +-----+ | +-----+ | |
56 |-|model|-+-+-----+ | |
57 +------+-/ +-----+ | |couch| | |
58 |script| | +-----+ | |
59 +------+ +-+------+ | |
60 | |memory| | |
61 | +------+ | |
62 +-+---+ | |
63 |...| | |
64 +---+ | |
65 | |
66 Toolbox was originally intended to use a directory of files, one per project, | |
67 as the backend. These were originally intended to be HTML files as the | |
68 above model may be clearly mapped as HTML:: | |
69 | |
70 <div class="project"><h1><a href="{{url}}">{{name}}</a></h1> | |
71 <p class="description">{{description}}</p> | |
72 {{for field in fields}} | |
73 <ul class="{{field}}"> | |
74 {{for value in values[field]}} | |
75 <li>{{value}}</li> | |
76 {{endfor}} | |
77 {{endfor}} | |
78 </div> | |
79 | |
80 This microformat approach allows not only easy editing of the HTML | |
81 documents, but the documents may be indepently served and displayed | |
82 without the toolbox server-side. | |
83 | |
84 The HTML microformat was never implemented (though, since the model | |
85 backend is pluggable, it easily could be). Instead, the original | |
86 implementation used JSON blobs stored in one file per tool. This | |
87 approach loses the displayable aspect, though since JSON is a defined | |
88 format with several good tools for exploring and manipulating the data | |
89 perhaps this disavantage is offset. | |
90 | |
91 A couch backend was also written. | |
92 | |
93 +------------+-----------+------------+ | |
94 |Displayable?|File-based?|Concurrency?| | |
95 +-----+------------+-----------+------------+ | |
96 |HTML |Yes |Yes |No | | |
97 +-----+------------+-----------+------------+ | |
98 |JSON |Not really |Yes |No | | |
99 +-----+------------+-----------+------------+ | |
100 |Couch|No |No |Yes? | | |
101 +-----+------------+-----------+------------+ | |
102 | |
103 The concurrency issue with file-based documennt backends may be | |
104 overcome by using locked files. Ideally, this is accomplished at the | |
105 filesystem level. If your filesystem does not promote this | |
106 functionality, it may be introduced programmatically. A rough cartoon | |
107 of a good implementation is as follows: | |
108 | |
109 1. A worker thread is spawned to write the data asynchronously. The | |
110 data is sent to the worker thread. | |
111 | |
112 2. The worker checks for the presence of a lockfile (herein further | |
113 detailed). If the lockfile exists and is owned by an active process, | |
114 the worker waits until said process is done with it. (For a more | |
115 robust implementation, the worker sends a request to write the file to | |
116 some controller.) | |
117 | |
118 3. The worker owns a lockfile based on its PID in some directory | |
119 parallel to the directory root under consideration (for example, | |
120 ``/tmp/toolbox/lock/${PID}-${filename}.lck``). | |
121 | |
122 4. The worker writes to the file. | |
123 | |
124 5. The worker removes the lock | |
125 | |
126 The toolbox web service uses a dispatcher->handler framework. The | |
127 handlers are loosely pluggable (they are assigned in the dispatcher), | |
128 but could (and probably should) be made completely pluggable. That | |
129 said, the toolbox web system features an integration of templates, | |
130 static resources (javascript, css, images), and handlers, so true | |
131 pluggability is further away than just supporting pluggable handlers | |
132 in the dispatcher. | |
133 | |
134 Deployment, however, may be tailored as desired. Any of the given | |
135 templates may be overridden via passing a ``template_dir`` parameter | |
136 with a path to a directory that have templates of the appropriate | |
137 names as found in toolbox's ``templates`` directory. | |
138 | |
139 Likewise, the static files (css, js, etc.) are served using ``paste``'s | |
140 ``StaticURLParser`` out of toolbox's ``static`` directory. (See | |
141 toolbox's ``factory.py``.) Notably this is *not* done using the WSGI | |
142 app itself. Doing it with middleware allows the deployment to be | |
143 customizable by writing your own factory. For example, instead of | |
144 using the ``paste`` webserver and the included ``paste.ini``, you | |
145 could use nginx or apache and ``mod_wsgi`` with a factory file | |
146 invoking ``Dispatcher`` with the desired arguments and serving the | |
147 static files with an arbitrary static file server. | |
148 | |
149 It is common sense, if rarely followed, that deployment should be | |
150 simple. If you want to get toolbox running on your desktop and/or for | |
151 testing, you should be able to do this easily (see the ``INSTALL.sh`` | |
152 for a simple installation using ``bash``; you'll probably want to | |
153 perform these steps by hand for any sort of real-world deployment). | |
154 If you want a highly customized deployment, then this will require | |
155 more expertise and manual setup. | |
156 | |
157 The template data and the JSON are closely tied together. This has the | |
158 distinct advantage of avoiding data translation steps and avoiding | |
159 code duplication. | |
160 | |
161 Toolbox uses several light-footprint libraries: | |
162 | |
163 * webob for Request/Response handling: http://pythonpaste.org/webob/ | |
164 | |
165 * tempita for (HTML) templates: http://pythonpaste.org/tempita/ | |
166 | |
167 * whoosh for search. This pure-python implementation of full-text | |
168 search is relatively fast (for python) and should scale decently to | |
169 the target scale of toolbox (1000s or 10000s of tools). While not as | |
170 fast as lucene, whoosh is easy to deploy and has a good API and | |
171 preserves toolbox as a deployable software product versus an | |
172 instance that requires the expert configuration, maintainence, and | |
173 tuning of several disparate software products that is both | |
174 non-automatable (cannot be installed with a script) and | |
175 time-consuming. http://packages.python.org/Whoosh/ | |
176 | |
177 * jQuery: jQuery is the best JavaScript library and everyone | |
178 should use it. http://jquery.com/ | |
179 | |
180 * jeditable for AJAXy editing: http://www.appelsiini.net/projects/jeditable | |
181 | |
182 * jquery-token for autocomplete: http://loopj.com/jquery-tokeninput/ | |
183 | |
184 * less for dynamic stylesheets: http://lesscss.org/ | |
185 | |
186 | |
187 User Interaction | |
188 ---------------- | |
189 | |
190 A user will typically interact with Toolbox through the AJAX web | |
191 interface. The server side returns relatively simple (HTML) markup, | |
192 but structured in such a way that JavaScript may be utilized to | |
193 promote rich interaction. The simple HTML + complex JS manifests | |
194 several things: | |
195 | |
196 1. The document is a document. The tools HTML presented to the user (with | |
197 the current objectionable exception of the per-project Delete button) | |
198 is a document form of the data. It can be clearly and easily | |
199 translated to data (for e.g. import/export) or simply marked up using | |
200 (e.g.) JS to add functionality. By keeping concerns seperate | |
201 (presentation layer vs. interaction layer) a self-evident clarity is | |
202 maintained. | |
203 | |
204 2. Computation is shifted client-side. Often, an otherwise lightweight | |
205 webapp loses considerable performance rendering complex templates. By | |
206 keeping the templates light-weight and doing control presentation and | |
207 handling in JS, high performance is preserved. | |
208 | |
209 | |
210 What Toolbox Doesn't Do | |
211 ----------------------- | |
212 | |
213 * versioning: toolbox exposes editing towards a canonical document. | |
214 It doesn't do versioning. A model instance may do whatever | |
215 versioning it desires, and since the models are pluggable, it would | |
216 be relatively painless to subclass e.g. the file-based model and | |
217 have a post-save hook which does an e.g. ``hg commit``. Customized | |
218 templates could be used to display this information. | |
219 | |
220 * authentication: the information presented by toolbox is freely | |
221 readable and editable. This is by intention, as by going to a "wiki" | |
222 model and presenting a easy to use, context-switching-free interface | |
223 curation is encouraged (ignoring the possibly imaginary problem of | |
224 wiki-spam). Access-level auth could be implemented using WSGI | |
225 middleware (e.g. repoze.who or bitsyauth) or through a front end | |
226 "webserver" integration layer such as Apache or nginx. Finer grained | |
227 control of the presentation layer could be realized by using custom | |
228 templates. | |
229 | |
230 | |
231 What Toolbox Would Like To Do | |
232 ----------------------------- | |
233 | |
234 Ultimately, toolbox should be as federated as possible. The basic | |
235 architecture of toolbox as a web service + supporting scripts makes | |
236 this feasible and more self-contained than most proposed federated | |
237 services. The basic federated model has proved, in practice, | |
238 difficult to achieve through purely the (HTTP) client-server model, as | |
239 without complete federation and adherence to protocol offline cron | |
240 jobs should be utilized to pull external data sources. If a webservice | |
241 only desires to talk to others of its own type and are willing to keep | |
242 a queue of requests for when hosts are offline, entire HTTP federation | |
243 may be implemented with only a configuration-specified discovery | |
244 service to find the nodes. | |
245 | |
246 | |
247 Evolution | |
248 --------- | |
249 | |
250 Often, a piece software is presented as a state out of context (that | |
251 is minus the evolution which led it to be and led it to look further | |
252 out towards beyond the horizon). While this is an interesting special | |
253 effect for an art project, software being communication this | |
254 is only conducive to software in the darkest of black-box approaches. | |
255 | |
256 "Beers are like web frameworks: if they're not micro, you don't know | |
257 what you're talking about." - hipsterhacker | |
258 | |
259 For sites that fit the architecture of a given framework, it may be | |
260 advisable to make use of them. However, for most webapp/webservice | |
261 categories which have a finite scope and definitive intent, it is | |
262 often easier, more maintainable, and more legible to build a complete | |
263 HTTP->WSGI->app architecture than to try to hammer a framework into | |
264 fitting your problem or redefining the problem to fit the framework. | |
265 This approach was used for toolbox. | |
266 | |
267 The GenshiView template, http://k0s.org/hg/GenshiView, was invoked to | |
268 generate a basic dispatcher->handler system. The cruft was removed, | |
269 leaving only the basic structure and the TempitaHandler since tempita | |
270 is lightweight and it was envisioned that filesystem tempita templates | |
271 (MakeItSo!) would be used elsewhere in the project. The basic | |
272 handlers (projects views, field-sorted view, new, etc.) were written | |
273 and soon a usable interface was constructed. | |
274 | |
275 A ``sample`` directory was created to hold the JSON blobs. Because | |
276 this was done early on, two goals were achieved: | |
277 | |
278 1. the software could be dogfooded immediately using actual applicable | |
279 data. This helped expose a number of issues concerning the data format | |
280 right away. | |
281 | |
282 2. There was a place to put tools before the project reached a | |
283 deployable state (previously, a few had lived in a static state using | |
284 a rough sketch of the HTML microformat discussed above on | |
285 k0s.org). Since the main point of toolbox is to record Mozilla tools, | |
286 the wealth of references mentioned in passing could be put somewhere, | |
287 instead of passed by and forgotten. One wishes that they do not miss | |
288 the train while purchasing a ticket. | |
289 | |
290 The original intent, when the file-based JSON blob approach was to be | |
291 the deployed backend, was to have two repositories: one for the code | |
292 and one for the JSON blobs. When this approach was scrapped, the | |
293 file-based JSON blobs were relegated to the ``sample`` directory, with | |
294 the intent to be to import them into e.g. a couch database on actual | |
295 deployment (using an import script). The samples could then be used | |
296 for testing. | |
297 | |
298 The model has a single "setter" function, ``def update``, used for | |
299 both creating and updating projects. Due to this and due to the fact | |
300 the model was ABC/pluggable from the beginning, a converter ``export`` | |
301 function could be trivially written at the ABC-level:: | |
302 | |
303 def export(self, other): | |
304 """export the current model to another model instance""" | |
305 for project in self.get(): | |
306 other.update(project) | |
307 | |
308 This with an accompanying CLI utility was used to migrate from JSON | |
309 blob files in the ``sample`` directory to the couch instance. This | |
310 particular methodology as applied to an unexpected problem (the | |
311 unanticipated switch from JSON blobs to couch) is a good example of | |
312 the power of using a problem to drive the software forward (in this | |
313 case, creation of a universal export function and associated command | |
314 line utility). The alternative, a one-off manual data migration, would | |
315 have been just as time consuming, would not be repeatable, would not | |
316 have extended toolbox, and may have (like many one-offs do) infected | |
317 the code base with associated semi-permanant vestiges. In general, | |
318 problems should be used to drive innovation. This can only be done if | |
319 the software is kept in a reasonably good state. Otherwise | |
320 considerable (though probably worthwhile) refactoring should be done | |
321 prior to feature extension which will become cost-prohibitive in | |
322 time-critical situations where a one-off is (more) likely to be employed. | |
323 | |
324 | |
325 Use Cases | |
326 --------- | |
327 | |
328 The target use-case is software tools for Mozilla, or, more generally, | |
329 a software index. For this case, the default fields uses are given in | |
330 the paste.ini file: usage, author, type, language. More fields may be | |
331 added to the running instance in the future. | |
332 | |
333 However, the classifier classification can be used for a wide variety | |
334 of web-locatable resources. A few examples: | |
335 | |
336 * songs: artist, album, genre, instruments | |
337 * de.li.cio.us: type, media, author, site | |
338 | |
339 | |
340 Resources | |
341 --------- | |
342 | |
343 * http://readthedocs.org/ |