13:53 July 24, 2012

Mozilla Automation and Testing : The Naming of (Talos) Things

As a Talos developer, I have found it confusing what Talos tests were named on TBPL and graphserver . I am not alone: https://bugzilla.mozilla.org/show_bug.cgi?id=770460 So I sat down to figure out how Talos was run by buildbot and how to correlate test names across Talos, buildbot, graphserver, and TBPL.

Buildbot initiates PerfConfigurator to generate a YAML configuration file which is then executed by run_tests.py . This may invoke any number of tests . Talos reports this information to the graphserver. The buildbot suite name is reported to TBPL as well as the links returned from graphserver

See also: https://wiki.mozilla.org/Buildbot/Talos#How_Talos_is_Run_in_Production

Buildbot

I set out to make a script that gathered this information and follow the information flow. The basic buildbot configuration is found in http://hg.mozilla.org/build/buildbot-configs/raw-file/tip/mozilla-tests/config.py . While I only needed the SUITES variable, which contains the name as reported to TBPL as well as the Talos command line for each suite, the entire file has to be imported and read by python to work. So I added buildbot as a package dependency. In addition, I had to mock the project_branches.py and localconfig.py files. For localconfig, I purely stubbed it, since I didn't need it anyway: http://k0s.org/mozilla/hg/talosnames/raw-file/tip/talosnames/localconfig.py For project_branches.py, I could have pulled this down in real time, and should for up-to-date information, but for momentary expedience I just copied it: http://k0s.org/mozilla/hg/talosnames/file/tip/talosnames/project_branches.py

Talos

This takes care of the buildbot information. For desktop talos, it is then possible to call PerfConfigurator with the arguments from mozilla-tests/config.py and generate a Talos configuration file. remotePerfConfigurator currently requires a device to be attached in order to work correctly, so I punted on that problem for the time being. Having the config file, it can be read to introspect how the tests are being run.

Hovering over a talos letter on TBPL, you can see the full name of the associated (TBPL) suite, e.g. Talos nochrome opt was successful, took 12mins when one hovers over T (n) . If you click on the n, you will see the name of the suite as reported by buildbot: Rev4 MacOSX Lion 10.7 mozilla-central talos nochromer . Note the nochromer from http://hg.mozilla.org/build/buildbot-configs/file/68c191f31d39/mozilla-tests/config.py#l291 You can also see the name of the test as reported to graphserver, in this case:

tdhtmlr_nochrome_paint: 738.29

Where the 738.29 is a link to the graphserver data . The name, tdhtmlr_nochome_paint is the name of the talos test plus the test name extension for Page Load tests but not for Startup tests : http://hg.mozilla.org/build/talos/file/de24503258c7/talos/output.py#l174

The test_name_extension appends _nochrome and/or _paint depending on if these flags are set, usually via --noChrome and --mozAfterPaint arguments to PerfConfigurator. In order to determine the correct test name extension, I used the talos.test module and inspected if the test was a subclass of TsBase or PageloaderTest : http://k0s.org/mozilla/hg/talosnames/file/ef8590b55605/talosnames/web.py#l63

TBPL

TBPL determines its suite name and letter from a long if..else regex matching chain in Data.js: http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/file/bad7f21362be/js/Data.js#l512 This takes the buildbot suite name, some magic and glue, and yields its long name, which is then matched up in http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/file/tip/js/Config.js to get the TBPL initial. Rather than trying to stub how this is done, I took advantage of the structure of this file in a horrible hack, whereby I matched the regexes with a regex and then extracted the information I wanted from them (Don't try this at home, kids!): http://k0s.org/mozilla/hg/talosnames/file/ef8590b55605/talosnames/api.py#l77 This is highly undesirable, but it does work (for the time being).

Graphserver

So we have buildbot, TBPL, and the talos sides of things figured out, nicely lining us up to tackle graphserver. Graphserver details the test mapping from short name to long name in the rather Kafka-esque data.sql schema: http://hg.mozilla.org/graphs/file/da54bac92c1b/sql/data.sql#l2568 I wanted to at least get the long graphserver names from the short names, as these are the only strings displayed in the UI. So I created a in-memory database using SQLite as there was no desire to persist the data, just read it, and SQLite is built in to python and avoids database-deployment woes. The table defitions was not SQLite-compatible, so I created my own table definitions . unix_timestamp() is not a SQLite function, so I removed lines containing a reference to this function. Fortunately, this does not affect any of the test lines I care about.

Putting it all together you get a table following the information flow:

  • buildbot has test suites which contain arguments to PerfConfigurator
  • PerfConfigurator generates a YAML file which is used as configuration to run one or more tests
  • the tests report results to graphserver and the resulting links are displayed on TBPL
  • the buildbot suite is reported to TBPL
  • graphserver maps the Talos test names, plus an extension for the page load test case, to a full name displayed it its UI

I called the script I wrote to parse all of this talosnames: http://k0s.org/mozilla/hg/talosnames . Its one of the messiest scripts I've ever written, though I suppose its partially amazing, given that no one ever thought about doing this before, that it was possible to write at all.

Currently, talosnames outputs just a single page which I host here: http://k0s.org/mozilla/talos/talosnames.html Its not dynamic, currently, but if it needs to be regenerated please feel free to ping me and I can do this.

How this could be easier

In general, this was mostly an exercise in untangling a web that we ourselves wove. If we had decided and stuck with conventions up front, there would be nothing to do here.

  • if Data.js was a JSON structure, talosnames could read this JSON and do the regex matching itself: https://bugzilla.mozilla.org/show_bug.cgi?id=774942
  • if talos test name extensions didn't depend on startup test vs page load test the world would be a better place
  • remotePerfConfigurator currently requires a device to be attached to generate configuration: https://bugzilla.mozilla.org/show_bug.cgi?id=775221 . If remotePerfConfigurator could work sans a device, we could generate and inspect this test information in talosnames.
  • I couldn't really figure out what buildbot command lines were for desktop and which were for mobile. I probably could have eventually tracked this down, or did a much easier hack whereby if --fennecIDs was in the command line then I'd call remotePerfConfigurator though the above prevented action on this anyway
  • data.sql should mostly go away
  • up to date data structures: talosnames graphs the tip of TBPL's Config.js and Data.js , builbot-config tip's config.py , and graphsever tip's data.sql . While this gets the latest information, it is unknown what the deployment state of any of these files are.

TODO

While I am glad to be able to sort this out a bit, a lot more could be done given the time.

  • a TBPL-like view that displays the TBPL abbreviations and maps to buildbot suites and tests
  • list which buildbot suites are active or inactive
  • Talos counters: the talos test config lists some of the counters (although not all) in http://k0s.org/mozilla/talos/talosnames.html . Graphserver, on the other hand, has entries for each of these counters on a per-test basis. Counters are mostly a mess in Talos. It would be nice to consolidate them there and display in talosnames all of the counters associated with a Talos test.