Automation and Testing : Overhaul of Talos Configuration
Last week I pushed a fix to bug 704654 that fixes a number of issues, conceptual and user-facing, with how Talos handles configuration. I've had an idea on how I wanted to do this for a few months now, but it has always been tabled. But with my (joking, sorry) pledge to Bob Moss to fix all bugs in Talos by the end of quarter.
I had a free weekend so instead of killing the prerequisite bugs as I usually do I decided to tackle the problem in one go. My goals:
I hear people prefer blog posts with pictures, so with no reason here is a bunch of cute foxes:
I've moved the basis of the Talos configuration to PerfConfigurator.py instead of some combination of .config files, PerfConfigurator.py, and run_tests.py. This gets rid of the duplication between the various config files as well as the command line options. In fact, there isn't much left of the configuration files
I don't like configuration to live in code, and so empathize with those who look at this cautiously from that point of view. However, PerfConfigurator following my rework isn't so much configuration, but a configuration basis. Given the goals above, some piece of code has to validate a given configuration, has to know what data is in a configuration, and has to provide whatever command line options are used to front-end the configuration. The previous incarnation of Talos and PerfConfigurator had a significant amount of code to this end, but it was both spread out and incomplete. So I don't think putting it all in one place is a big conceptual change. Having a piece of code that knows the allowable form of configuration gives great power and having the code all in one place just makes it more human-readable.
The unofficial history of Talos configuration, as I understand it, goes something like this: Initially, there was one configuration file. You copied it, edited it by hand, and ran your tests on it. At some point, this became cumbersome, and PerfConfigurator was created to automatically fill in values from a set of command-line choices, and in addition allow the values to be marked up a bit. The road was already paved for some part of configuration basis living in code versus in the .config file. Then, as the need to run tests in different configurations grew, .config files flourished to this end. I'd like to think the changes for bug 704654 as the next logical step in Talos's configuration evolution.
Longer term, we'd like to remove even more of Talos's configuration and replace .yml files with command line options. The complexity of configuration will be managed by mozharness .
Automation and Testing : Considering a Page-Centric Talos
Currently, the canonical unit of Talos tests is a page set. However, a page-centric point of view offers several intrinsic advantages on top of being, in my opinion, more conceptually coherent.
A page-centric point of view allows easy adding and updating of pages. Currently, making a new page set is a big deal. Since we average over all pages in a page set to obtain a quality metric, adding a new page (or removing a page) will change this number and the entire baseline for comparison has to be recentered. If we made the page the canonical unit of testing, then adding or removing a page doesn't involve a recentering as each page has a quality metric associated with it.
Taking an average over all pages to get a quality metric, as we do, gives a higher weight to pages that take (e.g.) longer to load. For instance, consider the output for tsvg:
|i|pagename|runs| |0;gearflowers.svg;79;65;68;68;67 |1;composite-scale.svg;46;35;44;41;42 |2;composite-scale-opacity.svg;21;22;24;22;20 |3;composite-scale-rotate.svg;23;21;21;20;19 |4;composite-scale-rotate-opacity.svg;19;24;19;19;23 |5;hixie-001.xml;45643;14976;17807;14971;17235 |6;hixie-002.xml;51257;15193;21693;14969;14974 |7;hixie-003.xml;5016;37375;5021;5024;5008 |8;hixie-004.xml;5052;5053;5054;5054;5053 |9;hixie-005.xml;4618;4533;4611;4532;4554 |10;hixie-006.xml;5059;5107;9741;5107;5089 |11;hixie-007.xml;1629;1651;1648;1652;1649
A performance loss (or gain) in e.g. gearflowers.svg is likely not to be noticed in this pageset as it is several orders of magnitude lower than (e.g.) hixie-002.xml, so a small percentage-wise noise in the latter could easily hide a legitimate regression in the former.
Having this additional data of what changes regress which pages allows us to explore how these particular page modifications affect performance. If we can isolate patterns, we can fix them.
One conceptual disadvantage to a page-centric approach is that deciding whether a changeset is a net regression or not becomes harder. Ideally a human (or other expert system) would evaluate all of the data across pages and decide whether a change is a regression or not. However, we have many pages and not enough people, so this is harder to do than to craft a formula for a quality metric. To obtain an overall quality metric for a push, some sort of averaging over pages must be done. We currently throw away the highest value and take the mean of remaining page averages. If we continue with this approach we throw away the ability easily add and remove pages without futzing with the metric. Instead, a method should be sought whereby adding a new page does not affect a metric.
Talos Signal from Noise: Configurable Talos Data Filters
As part of Signal from Noise I introduced a patch that changes the way --ignoreFirst works and adds configurable data filters to Talos :
While this is a small change in terms of how the code currently works, it lays the groundwork for a window of possibilities in terms of Talos statistics. Currently, pageloader calculates the "median" (ignoring the high value), the mean, the max, and the min, and outputs these along with the raw run data. Pageloader is for loading pages and taking measurements, not really for doing statistics. So it would be nice to move this upstream: first to Talos, then to graphserver proper.
Being able to specify data filters with --filter from the command line and filter: in the .yml configuration file allows the test-runner to change the "interesting number" by which we measure performance metrics on the fly. While there are currently only a few filters available, it is easy to add more metrics as we need them.
In a parallel effort, the JetPerf software consumes Talos filters . This is a good example of the expansion of the Talos ecosystem: as a ciritical part of our performance testing infrastructure, building tests and frameworks on top of Talos. In general, the A Team is moving towards a testing ecosystem of reusable parts and sane APIs.
Data filters were added to talos as an interim measure to make the "interesting number" calculations more flexible. As we play with different types of statistics, we need the ability to change configuration without having to jump through too many hoops and this fulfills this immediate need.
However, in the longer term, Talos and pageloader shouldn't really be doing statistics at all. They are in the "statistics gathering" camp where graphserver is in the "statistics processing" business. It would also be nice if there was a piece of software that let you analyze Talos results locally, ideally using the same statistics processing package that graphserver uses. This is outlined in https://bugzilla.mozilla.org/show_bug.cgi?id=721902 .
Talos Signal from Noise: analyzing the data
Recently, a change was pushed as part of the Signal from Noise effort in order to make Talos statistics better: https://bugzilla.mozilla.org/show_bug.cgi?id=710484 The idea being that the way were are doing things is skewing the data and not helping with noise.
Currently, pageloader calculates the median after throwing out the highest point: http://hg.mozilla.org/build/pageloader/file/beca399c3a16/chrome/report.js#l114 We introduced --ignoreFirst to instead ignore the first point and calculate the median of the remaining runs.
However, after introducing the change we noticed that our distribution had gone bimodal during side by side staging:
Were we doing something other than what we thought we were doing? Were our calculations wrong? Or was something else going on?
So jmaher and I dove in to take a look at the data. jmaher dug up a high-mode and low-mode case from the TBPL logs corresponding to the push sets displayed on graphserver
https://tbpl.mozilla.org/php/getParsedLog.php?id=8982519&tree=Firefox&full=1 high point: NOISE: __start_tp_report NOISE: _x_x_mozilla_page_load,109,NaN,NaN NOISE: _x_x_mozilla_page_load_details,avgmedian|109|average|354.25|minimum|NaN|maximum|NaN|stddev|NaN NOISE: |i|pagename|median|mean|min|max|runs| NOISE: |0;big-optimizable-group-opacity-2500.svg;123.5;354.25;92;1130;147;1130;1078;92;100 NOISE: |1;small-group-opacity-2500.svg;109;2333.25;103;9247;103;9012;9247;111;107 NOISE: __end_tp_report https://tbpl.mozilla.org/php/getParsedLog.php?id=8982267&tree=Firefox&full=1 low point: NOISE: __start_tp_report NOISE: _x_x_mozilla_page_load,108,NaN,NaN NOISE: _x_x_mozilla_page_load_details,avgmedian|108|average|113.00|minimum|NaN|maximum|NaN|stddev|NaN NOISE: |i|pagename|median|mean|min|max|runs| NOISE: |0;big-optimizable-group-opacity-2500.svg;119;353.75;91;1132;139;1132;1086;91;99 NOISE: |1;small-group-opacity-2500.svg;108;113;103;9116;103;133;9116;108;108 NOISE: __end_tp_report
From http://pastebin.mozilla.org/1470000 .
Since I can't really read this being a mere human being, I modified results.py to parse this data:
+ +if __name__ == '__main__': + import sys + string_high = """ +|0;big-optimizable-group-opacity-2500.svg;123.5;354.25;92;1130;147;1130;1078;92;100 +|1;small-group-opacity-2500.svg;109;2333.25;103;9247;103;9012;9247;111;107 +""" + string_low = """ +|0;big-optimizable-group-opacity-2500.svg;119;353.75;91;1132;139;1132;1086;91;99 +|1;small-group-opacity-2500.svg;108;113;103;9116;103;133;9116;108;108 +""" + big = PageloaderResults(string_high) + small = PageloaderResults(string_low) + import pdb; pdb.set_trace()
This makes some explorable PageloaderResults objects that explorable with pdb . While I did this for a one-off hack, this is something we'll probably generally want as part of Signal from Noise: https://bugzilla.mozilla.org/show_bug.cgi?id=722915
Then I looked at the data:
(Pdb) pp(small.results)
[{'index': '|0',
'max': 1132.0,
'mean': 353.75,
'median': 119.0,
'min': 91.0,
'page': 'big-optimizable-group-opacity-2500.svg',
'runs': [139.0, 1132.0, 1086.0, 91.0, 99.0]},
{'index': '|1',
'max': 9116.0,
'mean': 113.0,
'median': 108.0,
'min': 103.0,
'page': 'small-group-opacity-2500.svg',
'runs': [103.0, 133.0, 9116.0, 108.0, 108.0]}]
(Pdb) pp(big.results)
[{'index': '|0',
'max': 1130.0,
'mean': 354.25,
'median': 123.5,
'min': 92.0,
'page': 'big-optimizable-group-opacity-2500.svg',
'runs': [147.0, 1130.0, 1078.0, 92.0, 100.0]},
{'index': '|1',
'max': 9247.0,
'mean': 2333.25,
'median': 109.0,
'min': 103.0,
'page': 'small-group-opacity-2500.svg',
'runs': [103.0, 9012.0, 9247.0, 111.0, 107.0]}]
You'll notice that a few things from the runs data:
- the runs data is indeed bifurcated. In all case there is a low value, around a hundred, and a high value in the thousands
- contrary to the assumption that the first datapoint may be biased and high, you can't really see any bias, at least compared to the magnitude of the bifurcation
So how does this compare to the graphserver results? http://graphs-new.mozilla.org/graph.html#tests=[[170,1,21],[57,1,21]]&sel=1327791635000,1328041307110&displayrange=7&datatype=running
For the old data and the low value of the new data, we see times around 110-120ms. The high value of the new data is around 590ms. Are these numbers what we'd expect?
Throwing away the high value and taking the median for both data sets gives a number of the order of 100 or so (the old algorithm). Taking the median functions as a filter for the bifurcated results towards the majorant population. Since the low population is slightly more majorant, dropping the highest number in the way that pageloader does further biases towards it. It is not surprising we see no bifurcation in the old data.
For the new data, we drop the first run. Coincidentally or not, for the cases studied the first run was part of the low population, so that tends towards bifurcation. Taking the median of the remaining data points gives
High case:
- big-optimizable-group-opacity-2500.svg : (1078 + 100) / 2 = 589
- small-group-opacity-2500.svg : (9012 + 111) / 2 = 4561.5
Low case:
- big-optimizable-group-opacity-2500.svg (99 + 1086) / 2 = 592.5
- small-group-opacity-2500.svg : (133 + 108) / 2 = 120.5
So why does high case come out high and the low case come out low? So there is even more magic. Graphserver reports an average by take the mean of all the pages but discarding the high result: http://hg.mozilla.org/graphs/file/d93235e751c1/server/pyfomatic/collect.py#l208 (from http://hg.mozilla.org/graphs/file/d93235e751c1/server/pyfomatic/collect.py#l265 from http://hg.mozilla.org/graphs/file/d93235e751c1/server/collect.cgi ). Since both of the runs exhibit the high value of the bifurcation in the high case, you report the lower of the two bifurcated values: 589, from big-optimizable-group-opacity-2500.svg. Since in the low case only one of the values is bifurcated, you get the low value: 120.5, from small-group-opacity-2500.svg .
Okay mystery solved. We know why graphserver is reporting what data it is reporting and we also know that our algorithm is doing what we think it is doing. However, this is the beginning instead of the end of the problem.
By taking the average and discarding the high value of two data points, we are doing something weird and wrong. We are effectively only reporting one of the two pages. Note for the high and the low case what we are actually viewing data from the different pages! This is misleading and probably outright wrong. We essentially have two pages just to throw one of them away and then we have no confidence at what we are looking at. I'm not sure if the code at http://hg.mozilla.org/graphs/file/d93235e751c1/server/pyfomatic/collect.py#l208 would even work for a single page. Probably not. In general I grow increasingly skeptical of our amalgamation of results. We need increasingly to be able to get to and manipulate the raw data. We certainly need a way of digging into the stats and know what we're looking at and have confidence in it. In general, talos, pageloader, and graphserver need to be made such that it is both easier to try new filters as well as more transparent to what is actually happening.
We have been trying to bias towards the low numbers. Looking at the data for the four tests show that there are 13 low-state numbers and 7 high-state numbers. While there are more numbers in the low state, it is not an overwhelming majority.
This leaves the big elephant in the room: why are these runs bifurcated? Are we seeing a code path, or is something else happening on these builders that leads to bifurcated results? While this will be challenging to investigate, IMHO we should know why this happens. While our method of throwing out the highest data point, getting the median, throwing the data to graphserver, then getting the average of the whole pageset back, has a positive effect of minimizing noise (which is important), it is also sweeping a lot under the rug. We need to have confidence that what we're ignoring is okay to ignore. I don't have that confidence yet.
Mozilla Automation and Testing - Jetpack Performance Testing
I have a working proof of concept for Jetpack performance testing (JetPerf): http://k0s.org/mozilla/hg/jetperf . JetPerf uses mozharness to run Talos ts tests with an addon built with the Jetpack addon-sdk to measure differences betwen performance with and without the addon installed.
Playing with Jetpack + Talos performance lets us explore statistics in a bit more straight-forward manner than the production Talos numbers. As part of the Signal from Noise project which I am also part of, there is a lot of parts to staging even small changes in how we process Talos data since the system involved has many moving parts ( Talos, pageloader, graphserver ). By contrast, since JetPerf is a new project, it is much more flexible to explore the data that we have not hitherto explored.
I made a mozharness script to clone the hg mirror of addon-sdk . It then builds a sample addon and runs Talos with it installed.
Looking at raw numbers wasn't very interesting, so I made a parser for Talos's data format It was pretty quick to get some averages out before and after the addon was installed, but I thought it would be more usefulto display the raw data along with the averages.
These really aren't fair numbers, as currently the stub jetpack I use prints to a file, but its at least a start of a methodology.
The reason I'm sharing this isn't just to make a progress report, but more to present some ideas about thinking about what to do with Talos data. While this was done for JetPerf, much of this also applies to Signal from Noise. You run Talos and get some results. What do you do with them? Currently we just shove them into http://graphs.mozilla.org/ and say that's where you process them, but I think looking at them locally is not only important but necessary if you're doing development work. I think a big part of any statistics-heavy projects is to make it easy for all of the stakeholders to explore data, apply different filters and see how things fit together. While it takes a statistician to be rigorous about the process, anyone can play with statistics and it takes a village to really conceptualize what is being looked at. I hope, to this end, developers will use my software so that they can understand what it is doing and provide the valuable feedback I need.
JetPerf is still very much at a proof of concept stage. Ignoring the fact that none of it is in production, there are still many outstanding questions about basic facts of what we are doing here. But outside of polishing rough edges, here are some things on the pipe.
- test more variation of addons; currently we just load panel and print something to a file
- test on checkin (CI): so the main point of JetPerf is to get a better idea of what SDK changes cause addon performance regressions and hits, to be able to quantify them. While as stated this is a very open ended project, one thing to turn this from a casual exploration to a developer tool is running the tests on checkin. This will give an update in real time of if a checkin breaks performance.
- graphserver: in order to assess Jetpack's performance over time, we will want to send numbers to some sort of graphserver . This will allow us to keep track of the data, to view it, and apply various operations to it.
I may also spin off the (ad hoc) graphing portion and the Talos log parser portions into their own modules, as they may be useful outside of just Jetperf.
Mozilla Automation + Testing - MozBase Continuous Integration
As part of the A-Team 2011 Q4 goals I was able to devote a few days to setting up continuous integration (CI) for MozBase . I revived and extended autobot to support buildbot 0.8.5, set up tests and a simple test runner for mozbase, and deployed a test instance to k0s.org. You can see the waterfall here: http://k0s.org:8010
While buildbot comes with a gitpoller the version in buildbot 0.8.5 (the current in http://pypi.python.org/ ) did not work with git 1.6.3, the version on k0s.org. Since my box is on an ancient version of Ubuntu (and is remote and not trivially upgradable), I brought the generic autobot poller from being buildbot 0.8.3 compatible to 0.8.5 compatible (which is worth noting is not trivial). Also, while there has been a patch for an hgpoller submitted by Mozilla developers some four years ago, it has been be WONTFIX ed, so I went ahead with a generic polling architecture which (IMHO) seems a wiser architectural choice. While I sympathize with the architectural ideology of using a push-based architecture, and believe this is closer to ideal, polling will always work and does not require access to the repository servers which is a huge factor when using https://github.com or even Mozilla hg repositories. (Incidentally, I found neither this patch nor http://hg.mozilla.org/build/buildbotcustom/file/tip/changes/hgpoller.py to work OOTB, so, sadly, I proceeded to roll my own. Also incidentally, it is not trivial to depend on buildbotcustom using install_requires due to its lack of a setup.py file.) After debugging the gitpoller I pushed a test change and was happy to see that autobot built correctly. Autobot now listens to MozBase changes!
I was unable to finish the (parenthetical) Q4 goal of having autobot report to autolog , so this remains outstanding work. There is a lot that could be done with autolog. The basic idea and TODOs are outlined in the README (which itself could use some work; it is largely up to date except the Projects section, though incomplete). I will endeavor to work on this in my available time or as need escalates, but my priority for 2012 Q1 will be separating Talos Signal From Noise so it is unlikely I will be able to put a lot of time into autobot (sadly). On the other hand, I am more than willing to help and advise if anyone wants any features or to iron out the crinkles. While the architecture is not completely straight forward, it is a decent approximation to a convex hull over the problem space of having simple to write, simple to maintain, simple to debug continuous integration for small(er) projects. As usual, if anyone wanted to seek out alternate solutions, that is fine too, but I am essentially happy with my architecture decisions and technology choices.
Regardless of whether the CI solution for MozBase is autobot or (other), it is important to remember that continuous integration is a safety net and not a first line of defense. It is regrettable that autobot has no more notifications (yet) than the waterfall display and the autobot character lurking in #ateam (the default IRC bot isn't very verbal OOTB and I haven't had time to customize it). But I think having some (admittedly smokescreen) automated testing for MozBase is an important step towards the evolution of the software as well as towards development practices in general.
Auto-tools Q4 in reflection: progress on mozbase and talos
Most of my effort this quarter was spend on two related goals:
- developing a sane set of python packages to build test harnesses on top of. We call this MozBase: https://wiki.mozilla.org/Auto-tools/Projects/MozBase
- Making Talos sane and porting it to use the MozBase set of packages.
These are illustrated in our goals page: https://wiki.mozilla.org/Auto-tools/Goals/2011Q4#Mozbase
From one point of view, this isn't exciting work. But I live for this stuff. I think of software as an ecosystem to be cultivated and I live to cultivate it. So while, for the most part, I can't point to any exciting features that I implemented (nor were there planned to be), in retrospect I am proud of the fruits of my efforts and those of my team-mates and comrades. A big shout out to BYK and others who have stepped up to the plate to help the A-Team with these super-important efforts.
When I look back I see:
- Talos wasn't a python package. Now it is!
- MozBase didn't even exist or have a repo. Now it does
- MozBase didn't have documentation or tests worth speaking of. Now it has at least a good start!
- Talos even has a test for installation. We need more tests, but its a good start!
- There has been a lot of cleanup of Talos towards the end of making it more robust, easier to use, and easier to contribute to.
- The A-Team didn't have any community contributors. Now we do! This one actually makes me the happiest :)
When I look the progress, I see Talos evolving towards what I would call real software (instead of a one-off that has been extended to do way too much to make it a one-off) that Mozillians can hack on and extend and make useful changes to. This also sets the stage for making Talos easier for developers to use locally to test their changes as well as getting more of our test harnesses to use the MozBase suite of utilities as well as making it easier to write new harnesses without reinventing so much of the wheel.
One of our our next priorities towards these ends is Bug 713055 - get Talos on Mozharness in production This is a huge step towards making buildbot more extensible as well as having desktop talos be more accessible to developers in a way that should be identical to the way that it is done in automation. :aki has done a bunch of work to start moving our aging buildbot infrastructure towards something more sane. This is mozharness .
Armen (:armenzg) also updated the way that talos.zip is sought so that it can be decoupled from buildbot. This is another big step forward that he details in his blog post: talos.zip, talos.json and you .
So a huge shout out to :jmaher and :wlach for all the Talos help, and :ahal and :ctalbert as well as all the help from those in release engineering for making all of this possible. I look forward to getting this all better in the coming year.
The state of Talos this week:
This is a rough map of what we want to do. As said, with so many balls in the air, we will want to block on as little as possible and make as few really big changes at a time so that we can ensure that each piece of the puzzle fits together correctly.
I've been developing Talos recently. There are many caveats working on this test harness that demands a more rigorous process than, say, a webapp. It has a large amount of necessary platform-specific code. It is deployed in a complex infrastructure environment. And it has no tests.
In order to test Talos, the A*Team has an internal staging environment (thanks to the efforts of anode and bhearsum and others) that mirrors the production testing infrastructure environment. Like production, it requires an HTTP-hosted URL structure containing pageloader , a pageset (tp5 ), and other resources necessary for buildbot plus Talos. (We should probably document the directory structure.)
In order to test Talos, you point the A*Team staging environment configuration to your HTTP-hosted location of your copy of this structure of resources. Then you issue a buildbot sendchange (which can be scripted for ease of use) that corresponds to a set of Talos tests that are run on each platform of interest. We have some simple scripts to run tests (i.e. ./chrome.sh or ./dirty.sh) to run sets of tests as we do in production. This translates to a variety of buildbot sendchange commands appropriate for the tests to be run. Green runs means good.
In order to test my Talos changes, I needed to setup a system whereby I could translate my changes into a hosted copy of talos, pageloader, etc. So here is what I did.
Steps:
Replicate http://people.mozilla.org/~jmaher/taloszips/tip/
It would be nice to provide a sane base template for this.
Put the talos zips on a web server:
cd mozilla/web/talos # change to a desired hosted directory wget -r -l0 --no-parent http://people.mozilla.org/~jmaher/taloszips/tip/ mv people.mozilla.org/~jmaher/taloszips/tip # the piece you need rm -rf people.mozilla.org # cleanup unneeded directories find tip -iname 'index.html*' -delete # remove unneeded index pages[Example: http://k0s.org/mozilla/talos/tip/]
Clone a copy of Talos:
cd ~/mozilla/src/ virtualenv.py talos-staging cd talos-staging; mkdir src; cd src hg clone http://k0s.org/mozilla/hg/talos echo 'default-push = ssh://k0s.org/mozilla/hg/talos' >> talos/.hg/hgrcDevelopment process:
Based on jmaher's update_talos.sh, I wrote a script to help me turn changes into changes in my hosted copy of talos.zip. Since I work largely in diffs hosted on bugzilla or my mercurial queue of Talos patches, I wanted a script that would apply a series of changes to a checkout of talos . In addition, I wanted to keep the flexibility of being able to edit these files on disk.
The script lives at http://k0s.org/mozilla/update_talos.py . I will endeavor to improve it as testing needs become more apparent. It sadly loses jmaher's update_talos.sh feature to create versioned zips. I thought about hosting a dedicated talos repository for testing (and still may, if that seems better down the line), but usually want to test a specific change and rollback to a known state.
The script does the following:
- Cleans up and reclones, optionally
- Applies a series of diffs
- Creates a talos.zip and moves to the appropriate place on disc.
- Fetches a fresh copy of pageloader.xpi
- Syncs the files with the HTTP server
- Cleans up and reclones, optionally
After the HTTP copy is updated, I can run (e.g.) xperf.sh to trigger that set of tests in the staging environment and watch the waterfall to assess the viability of the change
It would be nice to have something more generic, but the path to good software is through iteration. Perhaps as more people develop their own scripts to test Talos in the staging environment we will evolve to a more generic script to update talos as well as copies or templates of the URL/directory structure of what as needed as well as the staging software.
Introducing MozBase
Over the years, Mozilla has developed a number of test harnesses for automated testing of Firefox and other applications. Most of the harness code is written in python due to its utility towards this type of development. As one would expect, the harnesses arose from necessity and grew organically. However, as the harnesses grew it became apparent that there were several generic tasks that the harnesses shared:
- creating and manipulating a profile
- installing addons into the profile
- invoking (e.g.) Firefox in a desired manner
- process management
- ...a few other things
These pieces have largely been developed in a vacuum (in the early stages) or copy+pasted from other harnesses (in the later stages). This has lead to duplicated functionality, difficult to maintain and inconsistent harness software (since fixing things one place means that they probably need to fix them other places), and a system which was fully understood by no one after it became of sufficient complexity. The harness software could not be reused because it is tightly coupled to the implementation even when the underlying intent was generic.
Meet MozBase!
As software grows, it should be cultivated such that the effectivity and its knowledge base are maximized. Code should be made reusable and the architecture evolved towards a representation of intent. This is the goal of the MozBase effort by the A-Team : https://wiki.mozilla.org/Auto-tools/Projects/MozBase
- we want to make high quality components to build test harnesses
- ... and other pieces of software
- ... that might be useful on their own
- we want to replace existing code with these pieces
- ... but cultivate their knowledge base
- we want to develop canonical and reusable python tools
- ... and encourage the community to use them
Developing MozBase is one of the A-Team goals this quarter. While cultivating software is an ongoing effort, we're off to a good start. We already have several MozBase python packages:
- mozprocess for reliable mozprocess execution and termination
- mozprofile for creating and manipulating profiles
- mozinfo for determining system information
- mozrunner for invoking Firefox in a harness-friendly way
- ManifestDestiny for universal test manifests
- mozhttpd , a webserver for testing purposes
- ... others available at the mozbase repository
Our immediate goals are to cultivate these into high-quality tools taking lessons from the existing harnesses. Then, porting the harnesses to these tools that can be maintained in a unified manner. Right now, we're working on Talos both because this is a good proving ground for these tools and because much of its code can be replaced with MozBase code easily (for some definition of "easy").
While MozBase is about software, it is also about having a sane and maintainable environment to cultivate software in. While modular packages are great, their utility is in how they may be used together (as well as with other code) instead of in the craft of an individual package. So we're tackling these issues too.
Python importing in Mozilla Central: currently (most) python in mozilla central is not packaged and we manually futz with pythonpath and sys.path in several inconsistent and hard to maintain ways. In order to move towards python packages in any reasonable fashion we need to make importing easy and unified as well as moving towards how the python world typically does importing. There is bug 661908 for creating a unified virtualenv in the $OBJDIR. Work is likely to start on this or a similar effort soon (either this quarter or Q1 2012).
Mirroring software to Mozilla Central: we have hampered ourselves -- rewritten software and avoided fixing bugs -- by not using third-party python packages for tools that live in mozilla-central. In addition, since many of the test harness already live in m-c , if we are going to move these to consume mozbase we will need a strategy to mirror it and other software to the tree. While nothing has been definitively decided, preliminary discussion has pointed towards having a script to fetch resources from a variety of locations and add them to mozilla-central or elsewhere. We're having a meeting this week to figure out what we really want to do and go from there.
Such is the MozBase effort. I am excited to start moving our code into a solid maintainable structure, and I hope you are too. If you are, please check out our github project or sign in to #ateam# and tell us what you think. We'd love contributors!
jhammel now maintains mozregression
So the secret is out!
http://harthur.wordpress.com/2011/11/01/new-mozregression-owner/
I am going to be maintaining mozregression going forward. I released a 0.6 version to pypi today which hopefully fixes a few setup.py issues. You can find me at jhammel __at__ mozilla __dot__ com or as jhammel in #ateam.
http://groups.google.com/group/mozilla.tools/t/b1f12f5127761207
Talos is now a python package
The A-Team is working on creating a set of high-quality python utilities that are consumable, general purpose, and interoperable in an effort called MozBase . A huge part of this quarter's effort is to improve Talos to consume MozBase software and to make it an extensible harness that may also be consumed.
As one of the first steps towards making Talos consume upstream MozBase packages, I have made Talos a python package . This allows Talos to depend on upstream python packages in an automated fashion, permit additional setup/install time steps to be automated, and install in a manner that dotted paths against talos can be resolved by python import. That is, other packages can now usefully import talos without depending on a set directory structure.
Unfortunately, since the talos repository was arranged such that all the python scripts and other data lived in a fairly disorganized top-level directory, this involved making a talos subdirectory and moving all files (except the README) into that subdirectory and carefully ensuring that all data resources were properly installed alongside the python scripts.
Even more unfortunately, this change led to some confusion that could have been avoided ahead of time. Talos uses a tests.zip file that contains both the scripts and the data, and though I would have liked to do additional cleanup as part of making Talos a python package, I deliberately held off on changing anything that would invalidate this methodology. However, unbeknownst to me, there were other resources that depended on the talos directory structure, and these got broken with my change. I apologize for that, and will communicate these changes more widely next time. In the meantime, if you have any tools that depend on the talos directory structure, know that they will break next time you update. If you have questions about this, please contact me.
Although the fallout was regrettable, I think this is a necessary and forward facing change in the light of MozBase, Mozharness , and general good python practices. We're now looking at deprecating the tests.zip methodology and moving towards a Mozharness script for running Talos for both desktop testers and production. More on that as things progress.