Mercurial > hg > toolbox
comparison README.txt @ 0:b0942f44413f
import from git://github.com/mozilla/toolbox.git
| author | Jeff Hammel <k0scist@gmail.com> |
|---|---|
| date | Sun, 11 May 2014 09:15:35 -0700 |
| parents | |
| children | 2ba55733b788 |
comparison
equal
deleted
inserted
replaced
| -1:000000000000 | 0:b0942f44413f |
|---|---|
| 1 The Story of Toolbox | |
| 2 ==================== | |
| 3 | |
| 4 Toolbox is fundamentally a document-oriented approach to resource | |
| 5 indexing. A "tool" consists three mandatory string fields -- name, | |
| 6 description, and URL -- that are generic to the large class of problems | |
| 7 of web resources, as well as classifiers, such as author, usage, type, | |
| 8 etc. A tool may have an arbitrary number of classifier fields as | |
| 9 needed. Each classifier consists of a set of values with which a tool | |
| 10 is tagged. This gives toolbox the flexibility to fit a large number of | |
| 11 data models, such as PYPI, DOAP, and others. | |
| 12 | |
| 13 | |
| 14 Running Toolbox | |
| 15 --------------- | |
| 16 | |
| 17 You can download and run the toolbox software yourself: | |
| 18 http://github.com/k0s/toolbox | |
| 19 | |
| 20 To serve in baseline mode, install the software and run:: | |
| 21 | |
| 22 paster serve paste.ini | |
| 23 | |
| 24 This will serve the handlers and static content using the paste | |
| 25 (http://pythonpaste.org) webserver using ``README.txt`` as the | |
| 26 ``/about`` page and serving the data in ``sample``. | |
| 27 | |
| 28 The dispatcher (``toolbox.dispatcher:Dispatcher``) is the central (WSGI) | |
| 29 webapp that designates per-request to a number of handlers (from | |
| 30 ``handlers.py``). The dispatcher has a few options: | |
| 31 | |
| 32 * about: path to a restructured text file to serve at ``/about`` | |
| 33 * model_type: name of the backend to use (memory_cache, file_cache, or couch) | |
| 34 * template_dir: extra directory to look for templates | |
| 35 | |
| 36 These may be configured in the ``paste.ini`` file in the | |
| 37 ``[app:toolbox]`` section by prepending with the namespace | |
| 38 ``toolbox.``. It is advisable that you copy the example ``paste.ini`` | |
| 39 file for your own usage needs. Additional ``toolbox.``-namespaced | |
| 40 arguments will be passed to the model. For instance, to specify the | |
| 41 directory for the ``file_cache`` model, the provided ``paste.ini`` uses | |
| 42 ``toolbox.directory = %(here)s/sample``. | |
| 43 | |
| 44 | |
| 45 Architecture | |
| 46 ------------ | |
| 47 | |
| 48 Toolbox uses a fairly simple architecture with a single abstract data | |
| 49 model allowing an arbitrary number of implementations to be constructed:: | |
| 50 | |
| 51 Interfaces Implementations | |
| 52 | |
| 53 +----+ +-+-----+ | |
| 54 |HTTP| | |files| | |
| 55 +----+---\ +-----+ | +-----+ | |
| 56 |-|model|-+-+-----+ | |
| 57 +------+-/ +-----+ | |couch| | |
| 58 |script| | +-----+ | |
| 59 +------+ +-+------+ | |
| 60 | |memory| | |
| 61 | +------+ | |
| 62 +-+---+ | |
| 63 |...| | |
| 64 +---+ | |
| 65 | |
| 66 Toolbox was originally intended to use a directory of files, one per project, | |
| 67 as the backend. These were originally intended to be HTML files as the | |
| 68 above model may be clearly mapped as HTML:: | |
| 69 | |
| 70 <div class="project"><h1><a href="{{url}}">{{name}}</a></h1> | |
| 71 <p class="description">{{description}}</p> | |
| 72 {{for field in fields}} | |
| 73 <ul class="{{field}}"> | |
| 74 {{for value in values[field]}} | |
| 75 <li>{{value}}</li> | |
| 76 {{endfor}} | |
| 77 {{endfor}} | |
| 78 </div> | |
| 79 | |
| 80 This microformat approach allows not only easy editing of the HTML | |
| 81 documents, but the documents may be indepently served and displayed | |
| 82 without the toolbox server-side. | |
| 83 | |
| 84 The HTML microformat was never implemented (though, since the model | |
| 85 backend is pluggable, it easily could be). Instead, the original | |
| 86 implementation used JSON blobs stored in one file per tool. This | |
| 87 approach loses the displayable aspect, though since JSON is a defined | |
| 88 format with several good tools for exploring and manipulating the data | |
| 89 perhaps this disavantage is offset. | |
| 90 | |
| 91 A couch backend was also written. | |
| 92 | |
| 93 +------------+-----------+------------+ | |
| 94 |Displayable?|File-based?|Concurrency?| | |
| 95 +-----+------------+-----------+------------+ | |
| 96 |HTML |Yes |Yes |No | | |
| 97 +-----+------------+-----------+------------+ | |
| 98 |JSON |Not really |Yes |No | | |
| 99 +-----+------------+-----------+------------+ | |
| 100 |Couch|No |No |Yes? | | |
| 101 +-----+------------+-----------+------------+ | |
| 102 | |
| 103 The concurrency issue with file-based documennt backends may be | |
| 104 overcome by using locked files. Ideally, this is accomplished at the | |
| 105 filesystem level. If your filesystem does not promote this | |
| 106 functionality, it may be introduced programmatically. A rough cartoon | |
| 107 of a good implementation is as follows: | |
| 108 | |
| 109 1. A worker thread is spawned to write the data asynchronously. The | |
| 110 data is sent to the worker thread. | |
| 111 | |
| 112 2. The worker checks for the presence of a lockfile (herein further | |
| 113 detailed). If the lockfile exists and is owned by an active process, | |
| 114 the worker waits until said process is done with it. (For a more | |
| 115 robust implementation, the worker sends a request to write the file to | |
| 116 some controller.) | |
| 117 | |
| 118 3. The worker owns a lockfile based on its PID in some directory | |
| 119 parallel to the directory root under consideration (for example, | |
| 120 ``/tmp/toolbox/lock/${PID}-${filename}.lck``). | |
| 121 | |
| 122 4. The worker writes to the file. | |
| 123 | |
| 124 5. The worker removes the lock | |
| 125 | |
| 126 The toolbox web service uses a dispatcher->handler framework. The | |
| 127 handlers are loosely pluggable (they are assigned in the dispatcher), | |
| 128 but could (and probably should) be made completely pluggable. That | |
| 129 said, the toolbox web system features an integration of templates, | |
| 130 static resources (javascript, css, images), and handlers, so true | |
| 131 pluggability is further away than just supporting pluggable handlers | |
| 132 in the dispatcher. | |
| 133 | |
| 134 Deployment, however, may be tailored as desired. Any of the given | |
| 135 templates may be overridden via passing a ``template_dir`` parameter | |
| 136 with a path to a directory that have templates of the appropriate | |
| 137 names as found in toolbox's ``templates`` directory. | |
| 138 | |
| 139 Likewise, the static files (css, js, etc.) are served using ``paste``'s | |
| 140 ``StaticURLParser`` out of toolbox's ``static`` directory. (See | |
| 141 toolbox's ``factory.py``.) Notably this is *not* done using the WSGI | |
| 142 app itself. Doing it with middleware allows the deployment to be | |
| 143 customizable by writing your own factory. For example, instead of | |
| 144 using the ``paste`` webserver and the included ``paste.ini``, you | |
| 145 could use nginx or apache and ``mod_wsgi`` with a factory file | |
| 146 invoking ``Dispatcher`` with the desired arguments and serving the | |
| 147 static files with an arbitrary static file server. | |
| 148 | |
| 149 It is common sense, if rarely followed, that deployment should be | |
| 150 simple. If you want to get toolbox running on your desktop and/or for | |
| 151 testing, you should be able to do this easily (see the ``INSTALL.sh`` | |
| 152 for a simple installation using ``bash``; you'll probably want to | |
| 153 perform these steps by hand for any sort of real-world deployment). | |
| 154 If you want a highly customized deployment, then this will require | |
| 155 more expertise and manual setup. | |
| 156 | |
| 157 The template data and the JSON are closely tied together. This has the | |
| 158 distinct advantage of avoiding data translation steps and avoiding | |
| 159 code duplication. | |
| 160 | |
| 161 Toolbox uses several light-footprint libraries: | |
| 162 | |
| 163 * webob for Request/Response handling: http://pythonpaste.org/webob/ | |
| 164 | |
| 165 * tempita for (HTML) templates: http://pythonpaste.org/tempita/ | |
| 166 | |
| 167 * whoosh for search. This pure-python implementation of full-text | |
| 168 search is relatively fast (for python) and should scale decently to | |
| 169 the target scale of toolbox (1000s or 10000s of tools). While not as | |
| 170 fast as lucene, whoosh is easy to deploy and has a good API and | |
| 171 preserves toolbox as a deployable software product versus an | |
| 172 instance that requires the expert configuration, maintainence, and | |
| 173 tuning of several disparate software products that is both | |
| 174 non-automatable (cannot be installed with a script) and | |
| 175 time-consuming. http://packages.python.org/Whoosh/ | |
| 176 | |
| 177 * jQuery: jQuery is the best JavaScript library and everyone | |
| 178 should use it. http://jquery.com/ | |
| 179 | |
| 180 * jeditable for AJAXy editing: http://www.appelsiini.net/projects/jeditable | |
| 181 | |
| 182 * jquery-token for autocomplete: http://loopj.com/jquery-tokeninput/ | |
| 183 | |
| 184 * less for dynamic stylesheets: http://lesscss.org/ | |
| 185 | |
| 186 | |
| 187 User Interaction | |
| 188 ---------------- | |
| 189 | |
| 190 A user will typically interact with Toolbox through the AJAX web | |
| 191 interface. The server side returns relatively simple (HTML) markup, | |
| 192 but structured in such a way that JavaScript may be utilized to | |
| 193 promote rich interaction. The simple HTML + complex JS manifests | |
| 194 several things: | |
| 195 | |
| 196 1. The document is a document. The tools HTML presented to the user (with | |
| 197 the current objectionable exception of the per-project Delete button) | |
| 198 is a document form of the data. It can be clearly and easily | |
| 199 translated to data (for e.g. import/export) or simply marked up using | |
| 200 (e.g.) JS to add functionality. By keeping concerns seperate | |
| 201 (presentation layer vs. interaction layer) a self-evident clarity is | |
| 202 maintained. | |
| 203 | |
| 204 2. Computation is shifted client-side. Often, an otherwise lightweight | |
| 205 webapp loses considerable performance rendering complex templates. By | |
| 206 keeping the templates light-weight and doing control presentation and | |
| 207 handling in JS, high performance is preserved. | |
| 208 | |
| 209 | |
| 210 What Toolbox Doesn't Do | |
| 211 ----------------------- | |
| 212 | |
| 213 * versioning: toolbox exposes editing towards a canonical document. | |
| 214 It doesn't do versioning. A model instance may do whatever | |
| 215 versioning it desires, and since the models are pluggable, it would | |
| 216 be relatively painless to subclass e.g. the file-based model and | |
| 217 have a post-save hook which does an e.g. ``hg commit``. Customized | |
| 218 templates could be used to display this information. | |
| 219 | |
| 220 * authentication: the information presented by toolbox is freely | |
| 221 readable and editable. This is by intention, as by going to a "wiki" | |
| 222 model and presenting a easy to use, context-switching-free interface | |
| 223 curation is encouraged (ignoring the possibly imaginary problem of | |
| 224 wiki-spam). Access-level auth could be implemented using WSGI | |
| 225 middleware (e.g. repoze.who or bitsyauth) or through a front end | |
| 226 "webserver" integration layer such as Apache or nginx. Finer grained | |
| 227 control of the presentation layer could be realized by using custom | |
| 228 templates. | |
| 229 | |
| 230 | |
| 231 What Toolbox Would Like To Do | |
| 232 ----------------------------- | |
| 233 | |
| 234 Ultimately, toolbox should be as federated as possible. The basic | |
| 235 architecture of toolbox as a web service + supporting scripts makes | |
| 236 this feasible and more self-contained than most proposed federated | |
| 237 services. The basic federated model has proved, in practice, | |
| 238 difficult to achieve through purely the (HTTP) client-server model, as | |
| 239 without complete federation and adherence to protocol offline cron | |
| 240 jobs should be utilized to pull external data sources. If a webservice | |
| 241 only desires to talk to others of its own type and are willing to keep | |
| 242 a queue of requests for when hosts are offline, entire HTTP federation | |
| 243 may be implemented with only a configuration-specified discovery | |
| 244 service to find the nodes. | |
| 245 | |
| 246 | |
| 247 Evolution | |
| 248 --------- | |
| 249 | |
| 250 Often, a piece software is presented as a state out of context (that | |
| 251 is minus the evolution which led it to be and led it to look further | |
| 252 out towards beyond the horizon). While this is an interesting special | |
| 253 effect for an art project, software being communication this | |
| 254 is only conducive to software in the darkest of black-box approaches. | |
| 255 | |
| 256 "Beers are like web frameworks: if they're not micro, you don't know | |
| 257 what you're talking about." - hipsterhacker | |
| 258 | |
| 259 For sites that fit the architecture of a given framework, it may be | |
| 260 advisable to make use of them. However, for most webapp/webservice | |
| 261 categories which have a finite scope and definitive intent, it is | |
| 262 often easier, more maintainable, and more legible to build a complete | |
| 263 HTTP->WSGI->app architecture than to try to hammer a framework into | |
| 264 fitting your problem or redefining the problem to fit the framework. | |
| 265 This approach was used for toolbox. | |
| 266 | |
| 267 The GenshiView template, http://k0s.org/hg/GenshiView, was invoked to | |
| 268 generate a basic dispatcher->handler system. The cruft was removed, | |
| 269 leaving only the basic structure and the TempitaHandler since tempita | |
| 270 is lightweight and it was envisioned that filesystem tempita templates | |
| 271 (MakeItSo!) would be used elsewhere in the project. The basic | |
| 272 handlers (projects views, field-sorted view, new, etc.) were written | |
| 273 and soon a usable interface was constructed. | |
| 274 | |
| 275 A ``sample`` directory was created to hold the JSON blobs. Because | |
| 276 this was done early on, two goals were achieved: | |
| 277 | |
| 278 1. the software could be dogfooded immediately using actual applicable | |
| 279 data. This helped expose a number of issues concerning the data format | |
| 280 right away. | |
| 281 | |
| 282 2. There was a place to put tools before the project reached a | |
| 283 deployable state (previously, a few had lived in a static state using | |
| 284 a rough sketch of the HTML microformat discussed above on | |
| 285 k0s.org). Since the main point of toolbox is to record Mozilla tools, | |
| 286 the wealth of references mentioned in passing could be put somewhere, | |
| 287 instead of passed by and forgotten. One wishes that they do not miss | |
| 288 the train while purchasing a ticket. | |
| 289 | |
| 290 The original intent, when the file-based JSON blob approach was to be | |
| 291 the deployed backend, was to have two repositories: one for the code | |
| 292 and one for the JSON blobs. When this approach was scrapped, the | |
| 293 file-based JSON blobs were relegated to the ``sample`` directory, with | |
| 294 the intent to be to import them into e.g. a couch database on actual | |
| 295 deployment (using an import script). The samples could then be used | |
| 296 for testing. | |
| 297 | |
| 298 The model has a single "setter" function, ``def update``, used for | |
| 299 both creating and updating projects. Due to this and due to the fact | |
| 300 the model was ABC/pluggable from the beginning, a converter ``export`` | |
| 301 function could be trivially written at the ABC-level:: | |
| 302 | |
| 303 def export(self, other): | |
| 304 """export the current model to another model instance""" | |
| 305 for project in self.get(): | |
| 306 other.update(project) | |
| 307 | |
| 308 This with an accompanying CLI utility was used to migrate from JSON | |
| 309 blob files in the ``sample`` directory to the couch instance. This | |
| 310 particular methodology as applied to an unexpected problem (the | |
| 311 unanticipated switch from JSON blobs to couch) is a good example of | |
| 312 the power of using a problem to drive the software forward (in this | |
| 313 case, creation of a universal export function and associated command | |
| 314 line utility). The alternative, a one-off manual data migration, would | |
| 315 have been just as time consuming, would not be repeatable, would not | |
| 316 have extended toolbox, and may have (like many one-offs do) infected | |
| 317 the code base with associated semi-permanant vestiges. In general, | |
| 318 problems should be used to drive innovation. This can only be done if | |
| 319 the software is kept in a reasonably good state. Otherwise | |
| 320 considerable (though probably worthwhile) refactoring should be done | |
| 321 prior to feature extension which will become cost-prohibitive in | |
| 322 time-critical situations where a one-off is (more) likely to be employed. | |
| 323 | |
| 324 | |
| 325 Use Cases | |
| 326 --------- | |
| 327 | |
| 328 The target use-case is software tools for Mozilla, or, more generally, | |
| 329 a software index. For this case, the default fields uses are given in | |
| 330 the paste.ini file: usage, author, type, language. More fields may be | |
| 331 added to the running instance in the future. | |
| 332 | |
| 333 However, the classifier classification can be used for a wide variety | |
| 334 of web-locatable resources. A few examples: | |
| 335 | |
| 336 * songs: artist, album, genre, instruments | |
| 337 * de.li.cio.us: type, media, author, site | |
| 338 | |
| 339 | |
| 340 Resources | |
| 341 --------- | |
| 342 | |
| 343 * http://readthedocs.org/ |
