<?xml version="1.0" encoding="iso-8859-1"?>
<rss version="2.0"><channel><title>k0s's blog</title><link>http://k0s.org/mozilla/blog</link><description>latest blog entries for k0s on mozilla-blog</description><lastBuildDate>Wed, 19 Jun 2013 19:49:21 GMT</lastBuildDate><generator>PyRSS2Gen-1.0.0</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Mozilla Automation and Testing: Signal from Noise, 2012</title><link>http://k0s.org/mozilla/blog/20130108154813</link><description>&lt;div class="blog-body"&gt;&lt;p&gt;Mozilla Automation and Testing: Signal from Noise, 2012&lt;/p&gt;
&lt;p&gt;We've written up what we've been doing as part of the huge effort of the
&lt;a class="reference external" href="https://wiki.mozilla.org/Auto-tools/Projects/Signal_From_Noise"&gt;Signal from Noise&lt;/a&gt;
project.&lt;/p&gt;
&lt;p&gt;Look at:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;where we were: &lt;a class="reference external" href="https://wiki.mozilla.org/Auto-tools/Projects/Signal_From_Noise/StatusNovember2011"&gt;https://wiki.mozilla.org/Auto-tools/Projects/Signal_From_Noise/StatusNovember2011&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;our plan: &lt;a class="reference external" href="https://wiki.mozilla.org/Auto-tools/Projects/Signal_From_Noise/Execution2012"&gt;https://wiki.mozilla.org/Auto-tools/Projects/Signal_From_Noise/Execution2012&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;where we are now and where we're going: &lt;a class="reference external" href="https://wiki.mozilla.org/Auto-tools/Projects/Signal_From_Noise/StatusNovember2012"&gt;https://wiki.mozilla.org/Auto-tools/Projects/Signal_From_Noise/StatusNovember2012&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;</description><author>k0s</author><guid isPermaLink="true">http://k0s.org/mozilla/blog/20130108154813</guid><pubDate>Tue, 08 Jan 2013 15:48:13 GMT</pubDate></item><item><title>Perils of Version Pegging in Python Packaging</title><link>http://k0s.org/mozilla/blog/20121113102023</link><description>&lt;div class="blog-body"&gt;&lt;p&gt;Perils of Version Pegging in Python Packaging&lt;/p&gt;
&lt;p&gt;When working on an ecosystem of python packages where some packages
depend on other packages, it becomes a question what versions of the
dependencies to require.  There are three basic choices:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;Unpegged: If &lt;tt class="docutils literal"&gt;foo&lt;/tt&gt; depends on &lt;tt class="docutils literal"&gt;bar&lt;/tt&gt;, allow any version of
&lt;tt class="docutils literal"&gt;bar&lt;/tt&gt; to be used.&lt;/li&gt;
&lt;li&gt;Exactly pegged: If &lt;tt class="docutils literal"&gt;foo&lt;/tt&gt; depends on &lt;tt class="docutils literal"&gt;bar&lt;/tt&gt;, require a specific
version of &lt;tt class="docutils literal"&gt;bar&lt;/tt&gt;.  This is done in python with the string
&lt;tt class="docutils literal"&gt;bar == 3.14&lt;/tt&gt; to require version 3.14 of bar.&lt;/li&gt;
&lt;li&gt;Forward compatible: If &lt;tt class="docutils literal"&gt;foo&lt;/tt&gt; depends on &lt;tt class="docutils literal"&gt;bar&lt;/tt&gt;, require a
minimum version of &lt;tt class="docutils literal"&gt;bar&lt;/tt&gt;. This is done in python with the string
&lt;tt class="docutils literal"&gt;bar &amp;gt;= 3.14&lt;/tt&gt; to require at least version 3.14 of bar.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;There is no magic bullet: all of these strategies have advantages and
disadvantages.  In general, the API of dependencies will change and
a consumer of a particular version will only work with a certain range
of versions of the dependency.  Because it is in general unknown
whether the next version of a dependency will break the API for
consuming software, there is not a blanket strategy whereby
compatability can be guaranteed via a &lt;tt class="docutils literal"&gt;setup.py&lt;/tt&gt; file.&lt;/p&gt;
&lt;p&gt;Considering the cases, case 1. allows for the most flexibility:  if
any version of dependency (&lt;tt class="docutils literal"&gt;bar&lt;/tt&gt;) is registered, the dependency is
satisfied. (Otherwise, the latest version of the dependency
will be downloaded and installed
from &lt;em&gt;e.g.&lt;/em&gt; &lt;a class="reference external" href="http://pypi.python.org/"&gt;http://pypi.python.org/&lt;/a&gt; .) However, case 1. is very
vulnerable to API changes in the dependency:  it does nothing to
ensure that the dependency is compatible with the consuming software.
Assuming that the latest versions of a set of packages are internally
compatible, a fresh install will give an internally compatible set of
packages.  However, if a package is updated there is nothing to
guarantee that the API is compatible.&lt;/p&gt;
&lt;p&gt;Case 2 is the most strict:  the consuming package demands a particular
version of a dependency.  If this strategy is followed for all
dependencies, it can be assured that for a particular version of the
consuming software (&lt;tt class="docutils literal"&gt;foo&lt;/tt&gt;) that a compatible version of the
dependency (&lt;tt class="docutils literal"&gt;bar&lt;/tt&gt;) is used.  However, this is done at the price of
losing forwards compatability.  If a new version of the dependency
(&lt;tt class="docutils literal"&gt;bar&lt;/tt&gt;) is available, it will not be used regardless of
compatability.&lt;/p&gt;
&lt;p&gt;Case 3 seeks to balance the alternatives: the consuming packages
demands a version of a dependency of at least a given version.  This
protects from using an API that is too old for the package of
interest.  This strategy also allows newer versions of the software to
be installed without complaining.  If the API hasn't changed, then
this is good.  However, this still does not protect from API changes.
If the newest version of &lt;tt class="docutils literal"&gt;bar&lt;/tt&gt; has a different API from the minimum
version specified in &lt;tt class="docutils literal"&gt;foo&lt;/tt&gt;'s &lt;tt class="docutils literal"&gt;setup.py&lt;/tt&gt;, while &lt;tt class="docutils literal"&gt;setup.py&lt;/tt&gt; won't
complain, the software will not work.  Ideally, one would be able to
note post-facto that there was an API-breaking change in the new
version and that all software pegged to &lt;tt class="docutils literal"&gt;bar &amp;gt;= 0.1&lt;/tt&gt; should really
be &lt;tt class="docutils literal"&gt;bar &amp;gt;= 0.1, bar &amp;lt; 1.0&lt;/tt&gt;.  However, once a distribution of (e.g.)
&lt;tt class="docutils literal"&gt;foo&lt;/tt&gt; is released, it cannot meaningfully be re-released.&lt;/p&gt;
&lt;/div&gt;</description><author>k0s</author><guid isPermaLink="true">http://k0s.org/mozilla/blog/20121113102023</guid><pubDate>Tue, 13 Nov 2012 10:20:23 GMT</pubDate></item><item><title>Mozilla Automation and Testing: How Talos Works and Why SfN is Hard</title><link>http://k0s.org/mozilla/blog/20120829151007</link><description>&lt;div class="blog-body"&gt;&lt;p&gt;Mozilla Automation and Testing: How Talos Works and Why SfN is Hard&lt;/p&gt;
&lt;p&gt;I've had several conversations since starting the
&lt;a class="reference external" href="https://wiki.mozilla.org/Auto-tools/Projects/Signal_From_Noise"&gt;signal from noise&lt;/a&gt;
project about enhancing the statistical fidelity of
&lt;a class="reference external" href="https://wiki.mozilla.org/Buildbot/Talos"&gt;Talos&lt;/a&gt;
numbers about &amp;quot;Why is this hard?&amp;quot;.  From a developer point of view,
you look at &lt;a class="reference external" href="http://graphs.mozilla.org/"&gt;http://graphs.mozilla.org/&lt;/a&gt; for a particular test, you see
a nice number per changeset.  The numbers might be a little rough (or
very rough), but things are good enough, right?  We just need to make
the numbers a little better and turn TBPL orange on failure.&lt;/p&gt;
&lt;p&gt;The truth of the matter is that those nice series of numbers hide a
whole story behind them.  For complex software like
&lt;a class="reference external" href="http://www.mozilla.org/en-US/firefox/new/"&gt;Firefox&lt;/a&gt; ,
performance testing is not an easy problem.  Talos performance testing
has historically been done by engineers who wanted to have some
numbers to compare.  While this is often how software starts --
throwing things together -- it is not to be mistaken for rigorous or
extensible.&lt;/p&gt;
&lt;div class="section" id="where-we-are-now"&gt;
&lt;h1&gt;Where we are now&lt;/h1&gt;
&lt;p&gt;I debated whether to start with how things currently look or how
things should look.  While starting with how thing should look gives
an unfettered view of Firefox performance testing, I've decided to
start with how things currently work for those familiar with the
current system and to emphasize the challenges getting from here to
where we need to go.  I'm not justifying (or contesting, for that
matter) the decisions as to why its done the way that it was done. I'm
just trying to explain it.&lt;/p&gt;
&lt;p&gt;To start off, we have two kinds of tests:
&lt;a class="reference external" href="https://wiki.mozilla.org/Buildbot/Talos#Startup_Tests"&gt;startup tests&lt;/a&gt;
and
&lt;a class="reference external" href="https://wiki.mozilla.org/Buildbot/Talos#Page_Load_Tests"&gt;page load tests&lt;/a&gt; .
In the interest of time and simplicity, let's pretend that it is true
that startup tests start the browser, load a URL, and then measure the
time at an event (&lt;tt class="docutils literal"&gt;onload&lt;/tt&gt; or &lt;tt class="docutils literal"&gt;mozAfterPaint&lt;/tt&gt;), then shutdown the
browser; and for page load tests we start the browser, load a list of
pages from a manifest (each page &lt;em&gt;N&lt;/em&gt; times), and then likewise
usually measures at some event.  There are many variations on this
theme:  tests can compute their own metrics, you can load the pages in
different order, etc., but the above is the basic idea.&lt;/p&gt;
&lt;p&gt;From this, we get a series of numbers.  For startup tests, it is just
a list of (e.g.) times.  For page load tests, you get a series (e.g.)
of &lt;em&gt;N&lt;/em&gt; numbers per page.&lt;/p&gt;
&lt;p&gt;Outside of a little streamlining and a lot of details I'm glossing
over, we mostly want to keep the above procedure.  The disparity
begins with what we do with those numbers.&lt;/p&gt;
&lt;p&gt;In order to send data to our
&lt;a class="reference external" href="http://graphs.mozilla.org/"&gt;graphserver&lt;/a&gt; ,
we have to get the data for each test into a
&lt;a class="reference external" href="https://wiki.mozilla.org/Buildbot/Talos/DataFormat"&gt;format&lt;/a&gt;
that graphserver likes.  Since the startup test results are just a
list of page (e.g.) load times, these can be directly translated to
the graph server format, using &lt;tt class="docutils literal"&gt;NULL&lt;/tt&gt; for page names.  For page load
tests, on the other hand, we have &lt;em&gt;N&lt;/em&gt; numbers for each page.  So Talos
&lt;a class="reference external" href="http://hg.mozilla.org/build/talos/file/6d79047595a4/talos/output.py#l191"&gt;averages&lt;/a&gt;
the values for each page and sends a list of averages.  Not that this
average may not be a straight mean. The
&lt;a class="reference external" href="http://hg.mozilla.org/build/talos/file/6d79047595a4/talos/PerfConfigurator.py#l213"&gt;default&lt;/a&gt;
is to ignore the maximum value (per page) and take the mean of the
remaining iterations.  But this is configurable per-test.&lt;/p&gt;
&lt;p&gt;The list of numbers and page names is uploaded to the graphserver.
When you look at graphserver, you see a single point for each test for
each changeset, not a list.  This is because graphserver does
&lt;a class="reference external" href="http://hg.mozilla.org/graphs/file/cc80cbf6155a/server/pyfomatic/collect.py#l212"&gt;additional averaging across the page set&lt;/a&gt; ,
ignoring the maximum value and taking the mean of the remaining
numbers.&lt;/p&gt;
&lt;p&gt;This is the crux of the problem.  Every time an average (of any form)
is taken, you're reducing a spectrum to a single scalar.  While having
a single number makes it easy to read and deal with, what that number
means is obscured.  The graphserver averaging is particularly
hazardous.  Since we average across pages, we are averaging numbers
that may be of very different scales.  So pages that take longer to
load/render have more weight than pages that take less time to
load...EXCEPT we throw away the most expensive page.  Which if you
think about it is strange:  the most expensive page is likely to be
consistent from run to run (if its not, then other strange things
could happen in this averaging).  We run this page many times and
upload it and then ignore it.&lt;/p&gt;
&lt;p&gt;The averaging on the Talos side is also problematic, though more
subtly.  As documented in
&lt;a class="reference external" href="https://wiki.mozilla.org/images/c/c0/Larres-thesis.pdf"&gt;Larres' thesis&lt;/a&gt; ,
load/render times for a particular page do not follow a bell curve
distribution. Multi-modal distributions are often seen in practice and
dropping the maximum value was (probably) done in order to nudge the
data towards the lowest mode.  However, when this doesn't work, the
averaging is just misleading.  While several hypotheses have been
proposed, no one ultimately knows what conditions cause the
multi-modality.  This would be a worthy field of study of its own right.&lt;/p&gt;
&lt;p&gt;So now we've reduced (in the page load test case) a 2d array of
numbers into a single number per test (or page set, depending on your
perspective) per changeset for display on &lt;a class="reference external" href="http://graphs.mozilla.org/"&gt;http://graphs.mozilla.org/&lt;/a&gt; .
Now how do we detect regressions?&lt;/p&gt;
&lt;p&gt;A regression:&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://graphs.mozilla.org/graph.html#tests=[[230,131,12]]&amp;amp;sel=1346117321000,1346290121000&amp;amp;displayrange=7&amp;amp;datatype=running"&gt;http://graphs.mozilla.org/graph.html#tests=[[230,131,12]]&amp;amp;sel=1346117321000,1346290121000&amp;amp;displayrange=7&amp;amp;datatype=running&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;According to our
&lt;a class="reference external" href="https://wiki.mozilla.org/Buildbot/Talos#Regressions"&gt;documentation&lt;/a&gt;,
&amp;quot;to determine whether a good point is &amp;quot;good&amp;quot; or &amp;quot;bad&amp;quot;, we take 20-30
points of historical data, and 5 points of future data.&amp;quot; Of course, it
doesn't tell how we use this data. Larres tells us more here:
&lt;a class="reference external" href="https://wiki.mozilla.org/images/c/c0/Larres-thesis.pdf#page=74"&gt;https://wiki.mozilla.org/images/c/c0/Larres-thesis.pdf#page=74&lt;/a&gt;
Essentially, we make two windows: one before the data point, and one
after the data point.  We use a
&lt;a class="reference external" href="http://en.wikipedia.org/wiki/Student%27s_t-test"&gt;t-test&lt;/a&gt;
to see if there is statistical significance between the two series.
If a regression is detected, the
&lt;a class="reference external" href="https://lists.mozilla.org/listinfo/dev-tree-management"&gt;dev-tree-management&lt;/a&gt;
list is emailed.  Then
&lt;a class="reference external" href="http://limpet.net/mbrubeck/"&gt;mbrubeck&lt;/a&gt; actually looks at the data
and tries to figure out if its an actual regression.  I get fifty or
more of these emails per day.  Most of them don't appear to be actual
regressions, at least to the naked eye given the amount of noise in
the various data sets.&lt;/p&gt;
&lt;p&gt;Using the &amp;quot;past&amp;quot; and &amp;quot;future&amp;quot; window is intended as a &amp;quot;before&amp;quot; and
&amp;quot;after&amp;quot; picture.  However, a big implicit assumption is that the
numbers are flat for each segment: that is to say that nothing is
pushed in the range before alters performance and nothing pushed
pushed in the range after alters performance.  This is a pretty brazen
assumption and has certainly been wrong long enough in practice.&lt;/p&gt;
&lt;p&gt;While looking at a single number of graphserver is very convenient, it
is also misleading.  The statistics applied to Talos data to determine
if a performance regression or improvement is seen is a good example
of a very engineer-y metric: various tactics are tried until something
is found that is sorta stable, &amp;quot;looks right&amp;quot;, but it isn't clear at
all what it measures or how rigorous it is.&lt;/p&gt;
&lt;p&gt;We can do better.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="where-we-want-to-be"&gt;
&lt;h1&gt;Where we want to be&lt;/h1&gt;
&lt;p&gt;The most important part of solving a problem is to get people to care
about the problem.  A critical part of getting software engineers to
care about a problem is to build a system that is easy (and maybe even
fun) to use. We need to be able to rigorously identify regressions.
This is a hard task. If a regression is seen, whatever UI we build for
it should clearly display it and clearly display why it is a
regression.  One artefact of our current system where we reduce all
the data into a single number per changeset is that it is not at all
clear if the regression is real or noise.  We have no ability to drill
down in the data and see which pages regressed.  We have no particular
clue as to what happened. In fact, regressions aren't marked on the
chart at all.&lt;/p&gt;
&lt;p&gt;Another critical part is sending the right signal to the
right people. This means getting rigorous regression information into
the hands of people that can understand (with the tools given) the
extent of the regression and can hopefully help determine why.
TBPL should go orange if a regression is pushed.  This does not mean
that we can never have regressions -- that is unrealistic, as desired features
may require performance regressions to implement, as well as trade-off
decisions between competing performance metrics (the infamous example
being speed vs memory).&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://wiki.mozilla.org/Auto-tools/Projects/Signal_From_Noise"&gt;https://wiki.mozilla.org/Auto-tools/Projects/Signal_From_Noise&lt;/a&gt; and
other
&lt;a class="reference external" href="/mozilla/blog"&gt;blog posts&lt;/a&gt; here have discussed in detail why we
want to keep the full spectrum of numbers that we get.  We don't
measure our noise levels.  We don't know how many samples are required
for convergence or if we've reached it or if that's even possible
(though Larres has done
&lt;a class="reference external" href="https://wiki.mozilla.org/images/c/c0/Larres-thesis.pdf"&gt;some analysis&lt;/a&gt; ).
We need to know this.  Some tests we might run for too many
iterations. Others we run for most assuredly far too few.&lt;/p&gt;
&lt;p&gt;And, as
&lt;a class="reference external" href="https://groups.google.com/d/msg/mozilla.dev.tree-management/QYvG8wIen6Y/vxzglH2ekYwJ"&gt;discussed elsewhere&lt;/a&gt;
,
I think it is extremely important that
people actually look at this data.  Having a system that is easy to
use, rigorous, and that documents how it does its calculations will be
a huge help, as people would actually have a reason to &lt;em&gt;want&lt;/em&gt; to use
it.  But we also need someone that's really ready and willing to drill
down and mine the data for the knowledge it contains.  Why do we get
multi-modal distributions?  What are we testing?  What &lt;em&gt;aren't&lt;/em&gt; we
testing?&lt;/p&gt;
&lt;p&gt;If you think this all sounds hard....it is!  Its a lot of work and
there aren't many appreciable short cuts.  Much of our work thus far
has been ripping out hacks that were made for expediency in the past,
and replacing them with less hacky code.  There are some things worth
doing right.  Going without performance tests for Firefox is pretty
much unthinkable, so we're left with the alternative: actually making
a system that works.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;</description><author>k0s</author><guid isPermaLink="true">http://k0s.org/mozilla/blog/20120829151007</guid><pubDate>Wed, 29 Aug 2012 15:10:07 GMT</pubDate></item><item><title>Mozilla Automation and Testing : The Naming of (Talos) Things</title><link>http://k0s.org/mozilla/blog/20120724135349</link><description>&lt;div class="blog-body"&gt;&lt;p&gt;Mozilla Automation and Testing : The Naming of (Talos) Things&lt;/p&gt;
&lt;p&gt;As a &lt;a class="reference external" href="https://wiki.mozilla.org/Buildbot/Talos"&gt;Talos&lt;/a&gt;
developer, I have found it confusing what Talos
tests were named on &lt;a class="reference external" href="http://tbpl.mozilla.org"&gt;TBPL&lt;/a&gt;
and &lt;a class="reference external" href="http://graphs.mozilla.org"&gt;graphserver&lt;/a&gt; .
I am not alone: &lt;a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=770460"&gt;https://bugzilla.mozilla.org/show_bug.cgi?id=770460&lt;/a&gt;
So I sat down to figure out how Talos was run by buildbot and how to
correlate test names across Talos, buildbot, graphserver, and TBPL.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://wiki.mozilla.org/Buildbot"&gt;Buildbot&lt;/a&gt; initiates
&lt;a class="reference external" href="http://hg.mozilla.org/build/talos/file/tip/talos/PerfConfigurator.py"&gt;PerfConfigurator&lt;/a&gt;
to generate a YAML configuration file which is then executed by
&lt;a class="reference external" href="http://hg.mozilla.org/build/talos/file/tip/talos/PerfConfigurator.py"&gt;run_tests.py&lt;/a&gt; .
This may invoke any number of
&lt;a class="reference external" href="http://hg.mozilla.org/build/talos/file/tip/talos/test.py"&gt;tests&lt;/a&gt; .
Talos reports this information to the graphserver.  The buildbot suite
name is reported to TBPL as well as the
&lt;a class="reference external" href="http://hg.mozilla.org/build/talos/file/e7adcc4c144a/talos/output.py#l338"&gt;links returned from graphserver&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;See also: &lt;a class="reference external" href="https://wiki.mozilla.org/Buildbot/Talos#How_Talos_is_Run_in_Production"&gt;https://wiki.mozilla.org/Buildbot/Talos#How_Talos_is_Run_in_Production&lt;/a&gt;&lt;/p&gt;
&lt;div class="section" id="id1"&gt;
&lt;h1&gt;Buildbot&lt;/h1&gt;
&lt;p&gt;I set out to make a script that gathered this information
and follow the information flow.
The basic buildbot configuration is found in
&lt;a class="reference external" href="http://hg.mozilla.org/build/buildbot-configs/raw-file/tip/mozilla-tests/config.py"&gt;http://hg.mozilla.org/build/buildbot-configs/raw-file/tip/mozilla-tests/config.py&lt;/a&gt; .
While I only needed the
&lt;a class="reference external" href="http://hg.mozilla.org/build/buildbot-configs/file/370cd7aac446/mozilla-tests/config.py#l179"&gt;SUITES&lt;/a&gt;
variable, which contains the name as reported to TBPL as well as the
Talos command line for each suite, the entire file has to be imported
and read by python to work.  So I added
&lt;a class="reference external" href="http://trac.buildbot.net/"&gt;buildbot&lt;/a&gt;
as a package dependency.  In addition, I had to mock the
&lt;tt class="docutils literal"&gt;project_branches.py&lt;/tt&gt; and &lt;tt class="docutils literal"&gt;localconfig.py&lt;/tt&gt; files.
For localconfig, I purely stubbed it, since I didn't need it anyway:
&lt;a class="reference external" href="http://k0s.org/mozilla/hg/talosnames/raw-file/tip/talosnames/localconfig.py"&gt;http://k0s.org/mozilla/hg/talosnames/raw-file/tip/talosnames/localconfig.py&lt;/a&gt;
For &lt;tt class="docutils literal"&gt;project_branches.py&lt;/tt&gt;, I could have pulled this down in real
time, and should for up-to-date information, but for momentary
expedience I just copied it: &lt;a class="reference external" href="http://k0s.org/mozilla/hg/talosnames/file/tip/talosnames/project_branches.py"&gt;http://k0s.org/mozilla/hg/talosnames/file/tip/talosnames/project_branches.py&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="id3"&gt;
&lt;h1&gt;Talos&lt;/h1&gt;
&lt;p&gt;This takes care of the buildbot information.  For desktop talos, it is
then possible to call &lt;tt class="docutils literal"&gt;PerfConfigurator&lt;/tt&gt; with the arguments from
&lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;mozilla-tests/config.py&lt;/span&gt;&lt;/tt&gt; and generate a Talos configuration file.
&lt;a class="reference external" href="https://wiki.mozilla.org/Mobile/Fennec/Android#talos"&gt;remotePerfConfigurator&lt;/a&gt;
currently requires a device to be attached in order to work correctly,
so I punted on that problem for the time being.
Having the config file, it can be read to introspect how the tests are
being run.&lt;/p&gt;
&lt;p&gt;Hovering over a talos letter on TBPL, you can see the full name of the
associated (TBPL) suite, e.g. &lt;em&gt;Talos nochrome opt was successful, took 12mins&lt;/em&gt;
when one hovers over &lt;em&gt;T (n)&lt;/em&gt; .  If you click on the &lt;em&gt;n&lt;/em&gt;, you will see
the name of the suite as reported by
buildbot: &lt;em&gt;Rev4 MacOSX Lion 10.7 mozilla-central talos nochromer&lt;/em&gt; .
Note the &lt;tt class="docutils literal"&gt;nochromer&lt;/tt&gt; from
&lt;a class="reference external" href="http://hg.mozilla.org/build/buildbot-configs/file/68c191f31d39/mozilla-tests/config.py#l291"&gt;http://hg.mozilla.org/build/buildbot-configs/file/68c191f31d39/mozilla-tests/config.py#l291&lt;/a&gt;
You can also see the name of the test as reported to graphserver, in
this case:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
tdhtmlr_nochrome_paint: 738.29
&lt;/pre&gt;
&lt;p&gt;Where the &lt;tt class="docutils literal"&gt;738.29&lt;/tt&gt; is
&lt;a class="reference external" href="http://graphs.mozilla.org/graph.html#tests=[[221,94,1]]"&gt;a link to the graphserver data&lt;/a&gt; .
The name, &lt;tt class="docutils literal"&gt;tdhtmlr_nochome_paint&lt;/tt&gt; is the name of the
&lt;a class="reference external" href="http://hg.mozilla.org/build/talos/file/de24503258c7/talos/test.py#l302"&gt;talos test&lt;/a&gt;
plus the test name extension for
&lt;a class="reference external" href="https://wiki.mozilla.org/Buildbot/Talos#Page_Load_Tests"&gt;Page Load tests&lt;/a&gt;
but not for
&lt;a class="reference external" href="https://wiki.mozilla.org/Buildbot/Talos#Startup_Tests"&gt;Startup tests&lt;/a&gt; :
&lt;a class="reference external" href="http://hg.mozilla.org/build/talos/file/de24503258c7/talos/output.py#l174"&gt;http://hg.mozilla.org/build/talos/file/de24503258c7/talos/output.py#l174&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The
&lt;a class="reference external" href="http://hg.mozilla.org/build/talos/file/e7adcc4c144a/talos/PerfConfigurator.py#l261"&gt;test_name_extension&lt;/a&gt;
appends &lt;tt class="docutils literal"&gt;_nochrome&lt;/tt&gt; and/or &lt;tt class="docutils literal"&gt;_paint&lt;/tt&gt; depending on if these flags
are set, usually via &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;--noChrome&lt;/span&gt;&lt;/tt&gt; and &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;--mozAfterPaint&lt;/span&gt;&lt;/tt&gt; arguments
to &lt;tt class="docutils literal"&gt;PerfConfigurator&lt;/tt&gt;.
In order to determine the correct test name extension, I used the
&lt;a class="reference external" href="http://hg.mozilla.org/build/talos/file/tip/talos/test.py"&gt;talos.test&lt;/a&gt;
module and inspected if the test was a subclass of
&lt;a class="reference external" href="http://hg.mozilla.org/build/talos/file/de24503258c7/talos/test.py#l50"&gt;TsBase&lt;/a&gt;
or
&lt;a class="reference external" href="http://hg.mozilla.org/build/talos/file/de24503258c7/talos/test.py#l137"&gt;PageloaderTest&lt;/a&gt; :
&lt;a class="reference external" href="http://k0s.org/mozilla/hg/talosnames/file/ef8590b55605/talosnames/web.py#l63"&gt;http://k0s.org/mozilla/hg/talosnames/file/ef8590b55605/talosnames/web.py#l63&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="id5"&gt;
&lt;h1&gt;TBPL&lt;/h1&gt;
&lt;p&gt;TBPL determines its suite name and letter from a long &lt;em&gt;if..else&lt;/em&gt; regex
matching chain in &lt;tt class="docutils literal"&gt;Data.js&lt;/tt&gt;:
&lt;a class="reference external" href="http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/file/bad7f21362be/js/Data.js#l512"&gt;http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/file/bad7f21362be/js/Data.js#l512&lt;/a&gt;
This takes the buildbot suite name, some magic and glue, and yields
its long name, which is then matched up in
&lt;a class="reference external" href="http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/file/tip/js/Config.js"&gt;http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/file/tip/js/Config.js&lt;/a&gt;
to get the TBPL initial. Rather than trying to stub how this is done,
I took advantage of the structure of this file in a horrible hack,
whereby I matched the regexes with a regex and then extracted the
information I wanted from them (&lt;em&gt;Don't try this at home, kids!&lt;/em&gt;):
&lt;a class="reference external" href="http://k0s.org/mozilla/hg/talosnames/file/ef8590b55605/talosnames/api.py#l77"&gt;http://k0s.org/mozilla/hg/talosnames/file/ef8590b55605/talosnames/api.py#l77&lt;/a&gt;
This is highly undesirable, but it does work (for the time being).&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="id6"&gt;
&lt;h1&gt;Graphserver&lt;/h1&gt;
&lt;p&gt;So we have buildbot, TBPL, and the talos sides of things figured out,
nicely lining us up to tackle graphserver.  Graphserver details the
test mapping from short name to long name in the rather Kafka-esque
&lt;tt class="docutils literal"&gt;data.sql&lt;/tt&gt; schema:
&lt;a class="reference external" href="http://hg.mozilla.org/graphs/file/da54bac92c1b/sql/data.sql#l2568"&gt;http://hg.mozilla.org/graphs/file/da54bac92c1b/sql/data.sql#l2568&lt;/a&gt;
I wanted to at least get the long graphserver names from the short
names, as these are the only strings displayed in the UI.
So I created a in-memory database using
&lt;a class="reference external" href="http://www.sqlite.org/"&gt;SQLite&lt;/a&gt; as there was no desire to persist
the data, just read it, and SQLite is built in to python and avoids
database-deployment woes.
&lt;a class="reference external" href="http://hg.mozilla.org/graphs/file/tip/sql/schema.sql"&gt;The table defitions&lt;/a&gt;
was not SQLite-compatible, so I
&lt;a class="reference external" href="http://k0s.org/mozilla/hg/talosnames/file/ef8590b55605/talosnames/api.py#l32"&gt;created my own table definitions&lt;/a&gt; .
&lt;tt class="docutils literal"&gt;unix_timestamp()&lt;/tt&gt; is not a SQLite function, so I removed lines
containing a reference to this function.  Fortunately, this does not
affect any of the test lines I care about.&lt;/p&gt;
&lt;p&gt;Putting it all together you get a table following the information flow:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;buildbot has test suites which contain arguments to &lt;tt class="docutils literal"&gt;PerfConfigurator&lt;/tt&gt;&lt;/li&gt;
&lt;li&gt;&lt;tt class="docutils literal"&gt;PerfConfigurator&lt;/tt&gt; generates a YAML file which is used as
configuration to run one or more tests&lt;/li&gt;
&lt;li&gt;the tests report results to graphserver and the resulting links are
displayed on TBPL&lt;/li&gt;
&lt;li&gt;the buildbot suite is reported to TBPL&lt;/li&gt;
&lt;li&gt;graphserver maps the Talos test names, plus an extension for the
page load test case, to a full name displayed it its UI&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I called the script I wrote to parse all of this talosnames:
&lt;a class="reference external" href="http://k0s.org/mozilla/hg/talosnames"&gt;http://k0s.org/mozilla/hg/talosnames&lt;/a&gt; . Its one of the messiest scripts
I've ever written, though I suppose its partially amazing, given that
no one ever thought about doing this before, that it was possible to
write at all.&lt;/p&gt;
&lt;p&gt;Currently, &lt;tt class="docutils literal"&gt;talosnames&lt;/tt&gt; outputs just a single page which I host here:
&lt;a class="reference external" href="http://k0s.org/mozilla/talos/talosnames.html"&gt;http://k0s.org/mozilla/talos/talosnames.html&lt;/a&gt;
Its not dynamic, currently, but if it needs to be regenerated please
feel free to ping me and I can do this.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="how-this-could-be-easier"&gt;
&lt;h1&gt;How this could be easier&lt;/h1&gt;
&lt;p&gt;In general, this was mostly an exercise in untangling a web that we
ourselves wove. If we had decided and stuck with conventions up front,
there would be nothing to do here.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;if &lt;tt class="docutils literal"&gt;Data.js&lt;/tt&gt; was a JSON structure, &lt;tt class="docutils literal"&gt;talosnames&lt;/tt&gt; could read this
JSON and do the regex matching itself:
&lt;a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=774942"&gt;https://bugzilla.mozilla.org/show_bug.cgi?id=774942&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;if talos test name extensions didn't depend on
&lt;a class="reference external" href="https://wiki.mozilla.org/Buildbot/Talos#Startup_Tests"&gt;startup test&lt;/a&gt;
vs
&lt;a class="reference external" href="https://wiki.mozilla.org/Buildbot/Talos#Page_Load_Tests"&gt;page load test&lt;/a&gt;
the world would be a better place&lt;/li&gt;
&lt;li&gt;&lt;tt class="docutils literal"&gt;remotePerfConfigurator&lt;/tt&gt; currently requires a device to be
attached to generate configuration:
&lt;a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=775221"&gt;https://bugzilla.mozilla.org/show_bug.cgi?id=775221&lt;/a&gt; .
If &lt;tt class="docutils literal"&gt;remotePerfConfigurator&lt;/tt&gt; could work sans a device, we could
generate and inspect this test information in &lt;tt class="docutils literal"&gt;talosnames&lt;/tt&gt;.&lt;/li&gt;
&lt;li&gt;I couldn't really figure out what buildbot command lines were for
desktop and which were for mobile.  I probably could have eventually
tracked this down, or did a much easier hack whereby if
&lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;--fennecIDs&lt;/span&gt;&lt;/tt&gt; was in the command line then I'd call
&lt;tt class="docutils literal"&gt;remotePerfConfigurator&lt;/tt&gt; though the above prevented action on this anyway&lt;/li&gt;
&lt;li&gt;&lt;tt class="docutils literal"&gt;data.sql&lt;/tt&gt; should mostly go away&lt;/li&gt;
&lt;li&gt;up to date data structures: &lt;tt class="docutils literal"&gt;talosnames&lt;/tt&gt; graphs the tip of TBPL's
&lt;a class="reference external" href="http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/file/tip/js/Config.js"&gt;Config.js&lt;/a&gt;
and
&lt;a class="reference external" href="http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/raw-file/tip/js/Data.js"&gt;Data.js&lt;/a&gt; ,
builbot-config tip's
&lt;a class="reference external" href="http://hg.mozilla.org/build/buildbot-configs/raw-file/tip/mozilla-tests/config.py"&gt;config.py&lt;/a&gt; ,
and graphsever tip's
&lt;a class="reference external" href="http://hg.mozilla.org/graphs/raw-file/tip/sql/data.sql"&gt;data.sql&lt;/a&gt; .
While this gets the latest information, it is unknown what the
deployment state of any of these files are.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div class="section" id="todo"&gt;
&lt;h1&gt;TODO&lt;/h1&gt;
&lt;p&gt;While I am glad to be able to sort this out a bit, a lot more could be
done given the time.&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;a TBPL-like view that displays the TBPL abbreviations and maps to
buildbot suites and tests&lt;/li&gt;
&lt;li&gt;list which buildbot suites are active or inactive&lt;/li&gt;
&lt;li&gt;Talos counters: the talos test config lists &lt;em&gt;some&lt;/em&gt; of the counters
(although not all) in &lt;a class="reference external" href="http://k0s.org/mozilla/talos/talosnames.html"&gt;http://k0s.org/mozilla/talos/talosnames.html&lt;/a&gt; .
Graphserver, on the other hand, has entries for each of these
counters on a per-test basis.  Counters are mostly a mess in Talos.
It would be nice to consolidate them there and display in
&lt;tt class="docutils literal"&gt;talosnames&lt;/tt&gt; all of the counters associated with a Talos test.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;</description><author>k0s</author><guid isPermaLink="true">http://k0s.org/mozilla/blog/20120724135349</guid><pubDate>Tue, 24 Jul 2012 13:53:49 GMT</pubDate></item><item><title>Automation and Testing : Overhaul of Talos Configuration</title><link>http://k0s.org/mozilla/blog/20120514090316</link><description>&lt;div class="blog-body"&gt;&lt;p&gt;Automation and Testing : Overhaul of Talos Configuration&lt;/p&gt;
&lt;p&gt;Last week I pushed a fix to
&lt;a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=704654"&gt;bug 704654&lt;/a&gt;
that fixes a number of issues, conceptual and user-facing, with how
Talos handles configuration.  I've had an idea on how I wanted to do
this for a few months now, but it has always been tabled.  But with my
(joking, sorry) pledge to Bob Moss to fix all bugs in
&lt;a class="reference external" href="https://wiki.mozilla.org/Buildbot/Talos"&gt;Talos&lt;/a&gt;
by the end of quarter.&lt;/p&gt;
&lt;p&gt;I had a free weekend so instead of killing the prerequisite bugs as I
usually do I decided to tackle the problem in one go.
My goals:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;remove the need to edit several different configuration to change a
configuration basis.  Most .config edits needed to happen in
5 places (formerly 6).  This is not only prone to human error (which
I and others have been guilty of many times), it is
a discouragement to change default configuration.&lt;/li&gt;
&lt;li&gt;consistent and declarative serialization/deserialization. Serialization in
PerfConfigurator was mostly awful, scanning through line by line and
looking for particular strings in (basically) an if-else tree, often
depending on particular whitespace or other subtle (and
undocumented) formatting issues.  While the .config files conform to
YAML, we don't make use of this for de/serialization.  In addition,
while in run_tests.py we allow command line overrides for the YAML
items, we do not post-process them as we would in
PerfConfigurator.&lt;/li&gt;
&lt;li&gt;consistent error checking.  Currently some of our config-checking is
in PerfConfigurator and some is done in run_tests.  This opens the
possibility that either case may miss cases where the other one
would find it. If you call run_tests.py with a .yml file, you will
not get the checking done for the combination of command line items
and the .yml configuration that is done in PerfConfigurator.  Since
we process a lot of command line items into resulting configuration,
this can lead to interesting results (e.g. while --activeTests is a
command line item for run_tests.py, it is not used, anywhere).
In general, configuration should be checked in one place before any
program logic takes place.  While this patch doesn't completely
address this issue, it a big step forward and should pave the way
for future improvement.&lt;/li&gt;
&lt;li&gt;configuration should be declarative.  You should get what you expect
from configuration, not inconsistent results.  If you edit a (e.g.)
.yml file with the existing Talos, you have no real way to know if
the keys you add or edit are going to be used by run_tests.py (and
what format they should be in, etc.) Having a basis for
configuration gives a single place to denote what is expected (and
thereby what isn't allowed) and the form that it is supposed to be
in. It is also nice to have all configuration in a single place
instead of having to look at a bunch of config files for the basis
as well as all over the code to see what is expected and how it is
processed.&lt;/li&gt;
&lt;li&gt;allow running directly from run_test.py . For particular
(e.g. production) systems, it may be advisable to use tuned
(.yml) configuration files to have highly customized runs (note that
we &lt;em&gt;don't&lt;/em&gt; do this and use (remote)PerfConfigurator in all cases for
reasons that may be infered from the above). However, for a typical
developer, there is little reason to run
&lt;tt class="docutils literal"&gt;PerfConfigurator &lt;span class="pre"&gt;-e&lt;/span&gt; `which firefox` &lt;span class="pre"&gt;-a&lt;/span&gt; ts &lt;span class="pre"&gt;--develop&lt;/span&gt; &lt;span class="pre"&gt;-o&lt;/span&gt; ts.yml &amp;amp;&amp;amp; talos &lt;span class="pre"&gt;-n&lt;/span&gt; &lt;span class="pre"&gt;-d&lt;/span&gt; ts.yml&lt;/tt&gt;
for a particular run.  Instead, the entirety of this may be invoked
with this patch as
&lt;tt class="docutils literal"&gt;talos &lt;span class="pre"&gt;-n&lt;/span&gt; &lt;span class="pre"&gt;-d&lt;/span&gt; &lt;span class="pre"&gt;-e&lt;/span&gt; `which firefox` &lt;span class="pre"&gt;-a&lt;/span&gt; ts &lt;span class="pre"&gt;--develop&lt;/span&gt; &lt;span class="pre"&gt;-o&lt;/span&gt; ts.yml&lt;/tt&gt;
in a one-step process.  (Note that we're still dumping to &lt;tt class="docutils literal"&gt;ts.yml&lt;/tt&gt;
though one wouldn't have to if the result is intended as ephemeral).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I hear people prefer blog posts with pictures, so with no reason here
is a bunch of cute foxes:&lt;/p&gt;
&lt;img alt="/mozilla/images/panda_adoption.jpg" src="/mozilla/images/panda_adoption.jpg" /&gt;
&lt;p&gt;I've moved the basis of the Talos configuration to
&lt;a class="reference external" href="http://hg.mozilla.org/build/talos/file/7fd63f3ef011/talos/PerfConfigurator.py"&gt;PerfConfigurator.py&lt;/a&gt;
instead of some combination of .config files, PerfConfigurator.py, and
run_tests.py.
This gets rid of the duplication between the various config files as
well as the command line options.  In fact, there isn't much left of
the configuration files&lt;/p&gt;
&lt;p&gt;I don't like configuration to live in code, and so empathize with
those who look at this cautiously from that point of view.  However,
PerfConfigurator following my rework isn't so much configuration, but
a configuration basis.  Given the goals above, some piece of code has
to validate a given configuration, has to know what data is in a
configuration, and has to provide whatever command line options are
used to front-end the configuration.  The previous incarnation of
Talos and PerfConfigurator had a significant amount of code to this
end, but it was both spread out and incomplete.  So I don't think
putting it all in one place is a big conceptual change.  Having a
piece of code that knows the allowable form of configuration gives
great power and having the code all in one place just makes it more
human-readable.&lt;/p&gt;
&lt;p&gt;The unofficial history of Talos configuration, as I understand it,
goes something like this:  Initially, there was one configuration
file.  You copied it, edited it by hand, and ran your tests on it.  At
some point, this became cumbersome, and PerfConfigurator was created
to automatically fill in values from a set of command-line choices,
and in addition allow the values to be marked up a bit.  The road was
already paved for some part of configuration basis living in code
versus in the .config file. Then, as the need to run tests in
different configurations grew, .config files flourished to this
end. I'd like to think the changes for
&lt;a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=704654"&gt;bug 704654&lt;/a&gt; as
the next logical step in Talos's configuration evolution.&lt;/p&gt;
&lt;p&gt;Longer term, we'd like to remove even more of Talos's configuration and
replace .yml files with command line options. The complexity of
configuration will be managed by
&lt;a class="reference external" href="http://escapewindow.com/mozharness/"&gt;mozharness&lt;/a&gt; .&lt;/p&gt;
&lt;/div&gt;</description><author>k0s</author><guid isPermaLink="true">http://k0s.org/mozilla/blog/20120514090316</guid><pubDate>Mon, 14 May 2012 09:03:16 GMT</pubDate></item><item><title>Automation and Testing : Considering a Page-Centric Talos</title><link>http://k0s.org/mozilla/blog/20120425093346</link><description>&lt;div class="blog-body"&gt;&lt;p&gt;Automation and Testing : Considering a Page-Centric Talos&lt;/p&gt;
&lt;p&gt;Currently, the canonical unit of Talos tests is a page set.
However, a page-centric point of view offers several intrinsic
advantages on top of being, in my opinion, more conceptually coherent.&lt;/p&gt;
&lt;p&gt;A page-centric point of view allows easy adding and updating of
pages. Currently, making a new page set is a big deal.  Since we
average over all pages in a page set to obtain a quality metric,
adding a new page (or removing a page) will change this number and
the entire baseline for comparison has to be recentered.  If we
made the page the canonical unit of testing, then adding or
removing a page doesn't involve a recentering as each page has a
quality metric associated with it.&lt;/p&gt;
&lt;p&gt;Taking an average over all pages to get a quality metric, as we do,
gives a higher weight to pages that take (e.g.) longer to load. For
instance, consider the output for &lt;tt class="docutils literal"&gt;tsvg&lt;/tt&gt;:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
|i|pagename|runs|
|0;gearflowers.svg;79;65;68;68;67
|1;composite-scale.svg;46;35;44;41;42
|2;composite-scale-opacity.svg;21;22;24;22;20
|3;composite-scale-rotate.svg;23;21;21;20;19
|4;composite-scale-rotate-opacity.svg;19;24;19;19;23
|5;hixie-001.xml;45643;14976;17807;14971;17235
|6;hixie-002.xml;51257;15193;21693;14969;14974
|7;hixie-003.xml;5016;37375;5021;5024;5008
|8;hixie-004.xml;5052;5053;5054;5054;5053
|9;hixie-005.xml;4618;4533;4611;4532;4554
|10;hixie-006.xml;5059;5107;9741;5107;5089
|11;hixie-007.xml;1629;1651;1648;1652;1649
&lt;/pre&gt;
&lt;p&gt;A performance loss (or gain) in e.g. &lt;tt class="docutils literal"&gt;gearflowers.svg&lt;/tt&gt; is likely not
to be noticed in this pageset as it is several orders of magnitude
lower than (e.g.) &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;hixie-002.xml&lt;/span&gt;&lt;/tt&gt;, so a small percentage-wise noise
in the latter could easily hide a legitimate regression in the former.&lt;/p&gt;
&lt;p&gt;Having this additional data of what changes regress which pages allows
us to explore how these particular page modifications affect
performance. If we can isolate patterns, we can fix them.&lt;/p&gt;
&lt;p&gt;One conceptual disadvantage to a page-centric approach is that
deciding whether a changeset is a net regression or not becomes
harder.  Ideally a human (or other expert system) would evaluate all
of the data across pages and decide whether a change is a regression
or not.  However, we have many pages and not enough people, so this is
harder to do than to craft a formula for a quality metric.
To obtain an overall quality metric for a push, some sort of averaging
over pages must be done.  We currently throw away the highest value
and take the mean of remaining page averages.  If we continue with
this approach we throw away the ability easily add and remove pages
without futzing with the metric.  Instead, a method should be sought
whereby adding a new page does not affect a metric.&lt;/p&gt;
&lt;/div&gt;</description><author>k0s</author><guid isPermaLink="true">http://k0s.org/mozilla/blog/20120425093346</guid><pubDate>Wed, 25 Apr 2012 09:33:46 GMT</pubDate></item><item><title>Talos Signal from Noise: Configurable Talos Data Filters</title><link>http://k0s.org/mozilla/blog/20120215124438</link><description>&lt;div class="blog-body"&gt;&lt;p&gt;Talos Signal from Noise: Configurable Talos Data Filters&lt;/p&gt;
&lt;p&gt;As part of
&lt;a class="reference external" href="https://wiki.mozilla.org/Auto-tools/Projects/Signal_From_Noise"&gt;Signal from Noise&lt;/a&gt;
I introduced a patch that changes the way &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;--ignoreFirst&lt;/span&gt;&lt;/tt&gt; works and
adds configurable data filters to
&lt;a class="reference external" href="https://wiki.mozilla.org/Buildbot/Talos"&gt;Talos&lt;/a&gt; :&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=723569"&gt;https://bugzilla.mozilla.org/show_bug.cgi?id=723569&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;landed as &lt;a class="reference external" href="http://hg.mozilla.org/build/talos/rev/5d955efe4678"&gt;http://hg.mozilla.org/build/talos/rev/5d955efe4678&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;While this is a small change in terms of how the code currently works,
it lays the groundwork for a window of possibilities in terms of Talos
statistics.  Currently, pageloader calculates the &amp;quot;median&amp;quot; (ignoring
the high value), the mean, the max, and the min, and outputs these
along with the raw run data.  Pageloader is for loading pages and
taking measurements, not really for doing statistics.  So it would be
nice to move this upstream: first to Talos, then to graphserver proper.&lt;/p&gt;
&lt;p&gt;Being able to specify data filters with &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;--filter&lt;/span&gt;&lt;/tt&gt; from the command
line and &lt;tt class="docutils literal"&gt;filter:&lt;/tt&gt; in the &lt;tt class="docutils literal"&gt;.yml&lt;/tt&gt; configuration file allows the
test-runner to change the &amp;quot;interesting number&amp;quot; by which we measure
performance metrics on the fly.  While there are currently only a few
filters available, it is easy to add more metrics as we need them.&lt;/p&gt;
&lt;p&gt;In a parallel effort, the
&lt;a class="reference external" href="https://wiki.mozilla.org/Auto-tools/Projects/JetPerf"&gt;JetPerf&lt;/a&gt;
&lt;a class="reference external" href="http://k0s.org/mozilla/hg/jetperf"&gt;software&lt;/a&gt;
consumes Talos
&lt;a class="reference external" href="http://hg.mozilla.org/build/talos/file/tip/talos/filter.py"&gt;filters&lt;/a&gt;
. This is a good example of the expansion of the Talos ecosystem: as a
ciritical part of our performance testing infrastructure, building
tests and frameworks on top of Talos.  In general, the
&lt;a class="reference external" href="http://wiki.mozilla.org/Auto-tools"&gt;A Team&lt;/a&gt; is moving towards a
testing ecosystem of reusable parts and sane APIs.&lt;/p&gt;
&lt;p&gt;Data filters were added to talos
as an interim measure to make the &amp;quot;interesting number&amp;quot; calculations
more flexible.  As we play with different types of statistics, we need
the ability to change configuration without having to jump through too
many hoops and this fulfills this immediate need.&lt;/p&gt;
&lt;p&gt;However, in the longer term, Talos and pageloader shouldn't really be
doing statistics at all.  They are in the &amp;quot;statistics gathering&amp;quot; camp
where
&lt;a class="reference external" href="https://wiki.mozilla.org/Perfomatic"&gt;graphserver&lt;/a&gt; is in the
&amp;quot;statistics processing&amp;quot; business.  It would also be nice if there was
a piece of software that let you analyze Talos results locally,
ideally using the same statistics processing package that &lt;tt class="docutils literal"&gt;graphserver&lt;/tt&gt; uses.
This is outlined in
&lt;a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=721902"&gt;https://bugzilla.mozilla.org/show_bug.cgi?id=721902&lt;/a&gt; .&lt;/p&gt;
&lt;img alt="http://k0s.org/mozilla/talos/bug-721902.gv.txt" src="http://k0s.org/mozilla/talos/bug-721902.gv.txt" /&gt;
&lt;/div&gt;</description><author>k0s</author><guid isPermaLink="true">http://k0s.org/mozilla/blog/20120215124438</guid><pubDate>Wed, 15 Feb 2012 12:44:38 GMT</pubDate></item><item><title>Talos Signal from Noise: analyzing the data</title><link>http://k0s.org/mozilla/blog/20120131164249</link><description>&lt;div class="blog-body"&gt;&lt;p&gt;Talos Signal from Noise: analyzing the data&lt;/p&gt;
&lt;p&gt;Recently, a change was pushed as part of the
&lt;a class="reference external" href="https://wiki.mozilla.org/Auto-tools/Projects/Signal_From_Noise"&gt;Signal from Noise&lt;/a&gt;
effort in order to make
&lt;a class="reference external" href="https://wiki.mozilla.org/Buildbot/Talos"&gt;Talos&lt;/a&gt;
statistics better: &lt;a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=710484"&gt;https://bugzilla.mozilla.org/show_bug.cgi?id=710484&lt;/a&gt;
The idea being that the way were are doing things is skewing the data
and not helping with noise.&lt;/p&gt;
&lt;p&gt;Currently,
&lt;a class="reference external" href="http://hg.mozilla.org/build/pageloader/"&gt;pageloader&lt;/a&gt;
calculates the median after throwing out the highest point:
&lt;a class="reference external" href="http://hg.mozilla.org/build/pageloader/file/beca399c3a16/chrome/report.js#l114"&gt;http://hg.mozilla.org/build/pageloader/file/beca399c3a16/chrome/report.js#l114&lt;/a&gt;
We introduced &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;--ignoreFirst&lt;/span&gt;&lt;/tt&gt; to instead ignore the first point and
calculate the median of the remaining runs.&lt;/p&gt;
&lt;p&gt;However, after introducing the change we noticed that our distribution
had gone bimodal during side by side staging:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="http://people.mozilla.org/~jmaher/sxs/sxs.html"&gt;http://people.mozilla.org/~jmaher/sxs/sxs.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://graphs-new.mozilla.org/graph.html#tests=[[170,1,21],[57,1,21]]&amp;amp;sel=1327791635000,1328041307110&amp;amp;displayrange=7&amp;amp;datatype=running"&gt;http://graphs-new.mozilla.org/graph.html#tests=[[170,1,21],[57,1,21]]&amp;amp;sel=1327791635000,1328041307110&amp;amp;displayrange=7&amp;amp;datatype=running&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Were we doing something other than what we thought we were doing? Were
our calculations wrong?  Or was something else going on?&lt;/p&gt;
&lt;p&gt;So
&lt;a class="reference external" href="http://elvis314.wordpress.com/"&gt;jmaher&lt;/a&gt;
and I dove in to take a look at the data.  &lt;em&gt;jmaher&lt;/em&gt; dug up a high-mode
and low-mode case from the TBPL logs corresponding to the push sets
displayed on
&lt;a class="reference external" href="http://graphs-new.mozilla.org/"&gt;graphserver&lt;/a&gt;&lt;/p&gt;
&lt;pre class="literal-block"&gt;
https://tbpl.mozilla.org/php/getParsedLog.php?id=8982519&amp;amp;tree=Firefox&amp;amp;full=1
high point:
NOISE: __start_tp_report
NOISE: _x_x_mozilla_page_load,109,NaN,NaN
NOISE: _x_x_mozilla_page_load_details,avgmedian|109|average|354.25|minimum|NaN|maximum|NaN|stddev|NaN
NOISE: |i|pagename|median|mean|min|max|runs|
NOISE: |0;big-optimizable-group-opacity-2500.svg;123.5;354.25;92;1130;147;1130;1078;92;100
NOISE: |1;small-group-opacity-2500.svg;109;2333.25;103;9247;103;9012;9247;111;107
NOISE: __end_tp_report


https://tbpl.mozilla.org/php/getParsedLog.php?id=8982267&amp;amp;tree=Firefox&amp;amp;full=1
low point:
NOISE: __start_tp_report
NOISE: _x_x_mozilla_page_load,108,NaN,NaN
NOISE: _x_x_mozilla_page_load_details,avgmedian|108|average|113.00|minimum|NaN|maximum|NaN|stddev|NaN
NOISE: |i|pagename|median|mean|min|max|runs|
NOISE: |0;big-optimizable-group-opacity-2500.svg;119;353.75;91;1132;139;1132;1086;91;99
NOISE: |1;small-group-opacity-2500.svg;108;113;103;9116;103;133;9116;108;108
NOISE: __end_tp_report
&lt;/pre&gt;
&lt;p&gt;From &lt;a class="reference external" href="http://pastebin.mozilla.org/1470000"&gt;http://pastebin.mozilla.org/1470000&lt;/a&gt; .&lt;/p&gt;
&lt;p&gt;Since I can't really read this being a mere human being, I modified
&lt;a class="reference external" href="http://hg.mozilla.org/build/talos/file/9c1b3addb9ee/talos/results.py"&gt;results.py&lt;/a&gt;
to parse this data:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
+
+if __name__ == '__main__':
+    import sys
+    string_high = &amp;quot;&amp;quot;&amp;quot;
+|0;big-optimizable-group-opacity-2500.svg;123.5;354.25;92;1130;147;1130;1078;92;100
+|1;small-group-opacity-2500.svg;109;2333.25;103;9247;103;9012;9247;111;107
+&amp;quot;&amp;quot;&amp;quot;
+    string_low = &amp;quot;&amp;quot;&amp;quot;
+|0;big-optimizable-group-opacity-2500.svg;119;353.75;91;1132;139;1132;1086;91;99
+|1;small-group-opacity-2500.svg;108;113;103;9116;103;133;9116;108;108
+&amp;quot;&amp;quot;&amp;quot;
+    big = PageloaderResults(string_high)
+    small = PageloaderResults(string_low)
+    import pdb; pdb.set_trace()
&lt;/pre&gt;
&lt;p&gt;This makes some explorable &lt;tt class="docutils literal"&gt;PageloaderResults&lt;/tt&gt; objects that explorable with
&lt;a class="reference external" href="http://docs.python.org/library/pdb.html"&gt;pdb&lt;/a&gt; .  While I did this for a
one-off hack, this is something we'll probably generally want as part of
Signal from Noise: &lt;a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=722915"&gt;https://bugzilla.mozilla.org/show_bug.cgi?id=722915&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Then I looked at the data:&lt;/p&gt;
&lt;pre class="literal-block"&gt;
(Pdb) pp(small.results)
[{'index': '|0',
  'max': 1132.0,
  'mean': 353.75,
  'median': 119.0,
  'min': 91.0,
  'page': 'big-optimizable-group-opacity-2500.svg',
  'runs': [139.0, 1132.0, 1086.0, 91.0, 99.0]},
 {'index': '|1',
  'max': 9116.0,
  'mean': 113.0,
  'median': 108.0,
  'min': 103.0,
  'page': 'small-group-opacity-2500.svg',
  'runs': [103.0, 133.0, 9116.0, 108.0, 108.0]}]
(Pdb) pp(big.results)
[{'index': '|0',
  'max': 1130.0,
  'mean': 354.25,
  'median': 123.5,
  'min': 92.0,
  'page': 'big-optimizable-group-opacity-2500.svg',
  'runs': [147.0, 1130.0, 1078.0, 92.0, 100.0]},
 {'index': '|1',
  'max': 9247.0,
  'mean': 2333.25,
  'median': 109.0,
  'min': 103.0,
  'page': 'small-group-opacity-2500.svg',
  'runs': [103.0, 9012.0, 9247.0, 111.0, 107.0]}]
&lt;/pre&gt;
&lt;p&gt;You'll notice that a few things from the runs data:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;the runs data is indeed bifurcated.  In all case there is a low value,
around a hundred, and a high value in the thousands&lt;/li&gt;
&lt;li&gt;contrary to the assumption that the first datapoint may be biased and high,
you can't really see any bias, at least compared to the magnitude of
the bifurcation&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;So how does this compare to the graphserver results?
&lt;a class="reference external" href="http://graphs-new.mozilla.org/graph.html#tests=[[170,1,21],[57,1,21]]&amp;amp;sel=1327791635000,1328041307110&amp;amp;displayrange=7&amp;amp;datatype=running"&gt;http://graphs-new.mozilla.org/graph.html#tests=[[170,1,21],[57,1,21]]&amp;amp;sel=1327791635000,1328041307110&amp;amp;displayrange=7&amp;amp;datatype=running&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;For the old data and the low value of the new data, we see times around
110-120ms.  The high value of the new data is around 590ms.  Are these
numbers what we'd expect?&lt;/p&gt;
&lt;p&gt;Throwing away the high value and taking the median for both data sets gives
a number of the order of 100 or so (the old algorithm).  Taking the median
functions as a filter for the bifurcated results towards the majorant
population.  Since the low population is slightly more majorant, dropping
the highest number in the way that pageloader does further biases towards it.
It is not surprising we see no bifurcation in the old data.&lt;/p&gt;
&lt;p&gt;For the new data, we drop the first run.  Coincidentally or not, for the cases
studied the first run was part of the low population, so that tends
towards bifurcation.  Taking the median of the remaining data points gives&lt;/p&gt;
&lt;p&gt;High case:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;big-optimizable-group-opacity-2500.svg : &lt;tt class="docutils literal"&gt;(1078 + 100) / 2 = 589&lt;/tt&gt;&lt;/li&gt;
&lt;li&gt;small-group-opacity-2500.svg : &lt;tt class="docutils literal"&gt;(9012 + 111) / 2 = 4561.5&lt;/tt&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Low case:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;big-optimizable-group-opacity-2500.svg &lt;tt class="docutils literal"&gt;(99 + 1086) / 2 = 592.5&lt;/tt&gt;&lt;/li&gt;
&lt;li&gt;small-group-opacity-2500.svg : &lt;tt class="docutils literal"&gt;(133 + 108) / 2&amp;nbsp; =&amp;nbsp; 120.5&lt;/tt&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;So why does high case come out high and the low case come out low?  So
there is even more magic.  Graphserver reports an average by take the
mean of all the pages but discarding the high result: &lt;a class="reference external" href="http://hg.mozilla.org/graphs/file/d93235e751c1/server/pyfomatic/collect.py#l208"&gt;http://hg.mozilla.org/graphs/file/d93235e751c1/server/pyfomatic/collect.py#l208&lt;/a&gt;
(from
&lt;a class="reference external" href="http://hg.mozilla.org/graphs/file/d93235e751c1/server/pyfomatic/collect.py#l265"&gt;http://hg.mozilla.org/graphs/file/d93235e751c1/server/pyfomatic/collect.py#l265&lt;/a&gt;
from &lt;a class="reference external" href="http://hg.mozilla.org/graphs/file/d93235e751c1/server/collect.cgi"&gt;http://hg.mozilla.org/graphs/file/d93235e751c1/server/collect.cgi&lt;/a&gt;
). Since both of the runs exhibit the high value of the bifurcation in
the high case, you report the lower of the two bifurcated values: 589,
from &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;big-optimizable-group-opacity-2500.svg&lt;/span&gt;&lt;/tt&gt;. Since in the low case
only one of the values is bifurcated, you get the low value: 120.5,
from small-group-opacity-2500.svg .&lt;/p&gt;
&lt;p&gt;Okay mystery solved.  We know why graphserver is reporting what data
it is reporting and we also know that our algorithm is doing what we
think it is doing.  However, this is the beginning instead of the end
of the problem.&lt;/p&gt;
&lt;p&gt;By taking the average and discarding the high value &lt;em&gt;of two data
points&lt;/em&gt;, we are doing something weird and wrong.  We are effectively
only reporting one of the two pages.  Note for the high and the low
case what we are actually viewing data from the different pages!  This
is misleading and probably outright wrong. We essentially have two
pages just to throw one of them away and then we have no confidence at
what we are looking at.  I'm not sure if the code at
&lt;a class="reference external" href="http://hg.mozilla.org/graphs/file/d93235e751c1/server/pyfomatic/collect.py#l208"&gt;http://hg.mozilla.org/graphs/file/d93235e751c1/server/pyfomatic/collect.py#l208&lt;/a&gt;
would even work for a single page.  Probably not. In general I grow
increasingly skeptical of our amalgamation of results.  We need
increasingly to be able to get to and manipulate the raw data.  We
certainly need a way of digging into the stats and know what we're
looking at and have confidence in it.  In general, talos, pageloader,
and graphserver need to be made such that it is both easier to try new
filters as well as more transparent to what is actually happening.&lt;/p&gt;
&lt;p&gt;We have been trying to bias towards the low numbers.  Looking at the data for
the four tests show that there are 13 low-state numbers and 7 high-state
numbers. While there are more numbers in the low state, it is not an
overwhelming majority.&lt;/p&gt;
&lt;p&gt;This leaves the big elephant in the room: why are these runs
bifurcated?  Are we seeing a code path, or is something else happening
on these builders that leads to bifurcated results?  While this will
be challenging to investigate, IMHO we should know why this happens.
While our method of throwing out the highest data point, getting the
median, throwing the data to graphserver, then getting the average of
the whole pageset back, has a positive effect of minimizing noise
(which is important), it is also sweeping a lot under the rug.  We
need to have confidence that what we're ignoring is okay to ignore.  I
don't have that confidence yet.&lt;/p&gt;
&lt;/div&gt;</description><author>k0s</author><guid isPermaLink="true">http://k0s.org/mozilla/blog/20120131164249</guid><pubDate>Tue, 31 Jan 2012 16:42:49 GMT</pubDate></item><item><title>Mozilla Automation and Testing - Jetpack Performance Testing</title><link>http://k0s.org/mozilla/blog/20120124131058</link><description>&lt;div class="blog-body"&gt;&lt;p&gt;Mozilla Automation and Testing - Jetpack Performance Testing&lt;/p&gt;
&lt;p&gt;I have a working proof of concept for
Jetpack
performance testing
(&lt;a class="reference external" href="https://wiki.mozilla.org/Auto-tools/Projects/JetPerf"&gt;JetPerf&lt;/a&gt;):
&lt;a class="reference external" href="http://k0s.org/mozilla/hg/jetperf"&gt;http://k0s.org/mozilla/hg/jetperf&lt;/a&gt; .
&lt;tt class="docutils literal"&gt;JetPerf&lt;/tt&gt; uses
&lt;a class="reference external" href="http://hg.mozilla.org/build/mozharness"&gt;mozharness&lt;/a&gt;
to run
&lt;a class="reference external" href="https://wiki.mozilla.org/Buildbot/Talos#ts"&gt;Talos ts&lt;/a&gt;
tests with an addon built with the
&lt;a class="reference external" href="https://wiki.mozilla.org/Jetpack"&gt;Jetpack&lt;/a&gt;
&lt;a class="reference external" href="https://addons.mozilla.org/en-US/developers/docs/sdk/latest/"&gt;addon-sdk&lt;/a&gt;
to measure differences betwen performance with and without the addon
installed.&lt;/p&gt;
&lt;p&gt;Playing with Jetpack + Talos performance lets us explore statistics in
a bit more straight-forward manner than the production Talos numbers.
As part of the
&lt;a class="reference external" href="https://wiki.mozilla.org/Auto-tools/Projects/Signal_From_Noise"&gt;Signal from Noise&lt;/a&gt;
project which I am also part of, there is a lot of parts to staging
even small changes in how we process Talos data since the system
involved has many moving parts
(
&lt;a class="reference external" href="https://wiki.mozilla.org/Buildbot/Talos"&gt;Talos&lt;/a&gt;,
&lt;a class="reference external" href="http://hg.mozilla.org/build/pageloader/"&gt;pageloader&lt;/a&gt;,
&lt;a class="reference external" href="https://wiki.mozilla.org/Perfomatic"&gt;graphserver&lt;/a&gt;
).  By contrast, since JetPerf is a new project, it is much more
flexible to explore the data that we have not hitherto explored.&lt;/p&gt;
&lt;p&gt;I made a
&lt;a class="reference external" href="http://k0s.org/mozilla/hg/jetperf/file/tip/jetperf/jetperf.py"&gt;mozharness script&lt;/a&gt;
to clone the
&lt;a class="reference external" href="'http://hg.mozilla.org/projects/addon-sdk"&gt;hg mirror of addon-sdk&lt;/a&gt; .
It then builds a
&lt;a class="reference external" href="http://k0s.org/mozilla/hg/jetperf/file/82dde6cc262f/jetperf/jetperf.py#l112"&gt;sample addon&lt;/a&gt;
and runs Talos with it installed.&lt;/p&gt;
&lt;p&gt;Looking at raw numbers wasn't very interesting, so I made a
&lt;a class="reference external" href="http://k0s.org/mozilla/hg/jetperf/file/tip/jetperf/compare.py"&gt;parser&lt;/a&gt;
for Talos's
&lt;a class="reference external" href="https://wiki.mozilla.org/Buildbot/Talos/DataFormat"&gt;data format&lt;/a&gt;
It was pretty quick to get some
&lt;a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=717036#c9"&gt;averages&lt;/a&gt;
out before and after the addon was installed, but I thought it would
be more usefulto display the raw data along with the averages.&lt;/p&gt;
&lt;img alt="https://bug717036.bugzilla.mozilla.org/attachment.cgi?id=591224" src="https://bug717036.bugzilla.mozilla.org/attachment.cgi?id=591224" /&gt;
&lt;p&gt;These really aren't fair numbers, as currently the stub jetpack I use
prints to a file, but its at least a start of a methodology.&lt;/p&gt;
&lt;p&gt;The reason I'm sharing this isn't just to make a progress report, but
more to present some ideas about thinking about what to do with Talos data.
While this was done for JetPerf, much of this also applies to Signal
from Noise.  You run Talos and get some results.  What do you do with
them?  Currently we just shove them into &lt;a class="reference external" href="http://graphs.mozilla.org/"&gt;http://graphs.mozilla.org/&lt;/a&gt;
and say that's where you process them, but I think looking at them
locally is not only important but necessary if you're doing
development work.  I think a big part of any statistics-heavy projects
is to make it easy for &lt;em&gt;all&lt;/em&gt; of the stakeholders to explore data,
apply different filters and see how things fit together.  While it
takes a statistician to be rigorous about the process, anyone can play
with statistics and it takes a village to really conceptualize what is
being looked at.  I hope, to this end, developers will use my software
so that they can understand what it is doing and provide the valuable
feedback I need.&lt;/p&gt;
&lt;div class="section" id="todo"&gt;
&lt;h1&gt;TODO&lt;/h1&gt;
&lt;p&gt;JetPerf is still very much at a proof of concept stage.  Ignoring the
fact that none of it is in production, there are still many
&lt;a class="reference external" href="https://bugzilla.mozilla.org/show_bug.cgi?id=717036"&gt;outstanding questions&lt;/a&gt;
about basic facts of what we are doing here.  But outside of polishing
rough edges, here are some things on the pipe.&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;test more variation of addons; currently we just load &lt;tt class="docutils literal"&gt;panel&lt;/tt&gt; and
print something to a file&lt;/li&gt;
&lt;li&gt;test on checkin (&lt;a class="reference external" href="http://en.wikipedia.org/wiki/Continuous_integration"&gt;CI&lt;/a&gt;):
so the main point of JetPerf is to get a better idea of what SDK
changes cause addon performance regressions and hits, to be able to
quantify them.  While as stated this is a very open ended project,
one thing to turn this from a casual exploration to a developer
tool is running the tests on checkin.  This will give an update in
real time of if a checkin breaks performance.&lt;/li&gt;
&lt;li&gt;graphserver: in order to assess Jetpack's performance over time, we
will want to send numbers to some sort of
&lt;a class="reference external" href="http://graphs.mozilla.org/"&gt;graphserver&lt;/a&gt; .
This will allow us to keep track of the data,
to view it, and apply various operations to it.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;I may also spin off the (ad hoc) graphing portion and the Talos log parser
portions into their own modules, as they may be useful outside of just
Jetperf.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;</description><author>k0s</author><guid isPermaLink="true">http://k0s.org/mozilla/blog/20120124131058</guid><pubDate>Tue, 24 Jan 2012 13:10:58 GMT</pubDate></item><item><title>Mozilla Automation + Testing - MozBase Continuous Integration</title><link>http://k0s.org/mozilla/blog/20120101165149</link><description>&lt;div class="blog-body"&gt;&lt;p&gt;Mozilla Automation + Testing - MozBase Continuous Integration&lt;/p&gt;
&lt;p&gt;As part of the
&lt;a class="reference external" href="https://wiki.mozilla.org/Auto-tools"&gt;A-Team&lt;/a&gt;
&lt;a class="reference external" href="https://wiki.mozilla.org/Auto-tools/Goals/2011Q4#Mozbase"&gt;2011 Q4 goals&lt;/a&gt;
I was able to devote a few days to setting up
continuous integration (CI) for
&lt;a class="reference external" href="https://wiki.mozilla.org/Auto-tools/Projects/MozBase"&gt;MozBase&lt;/a&gt; .
I revived and extended
&lt;a class="reference external" href="http://k0s.org/mozilla/hg/autobot"&gt;autobot&lt;/a&gt; to support
&lt;a class="reference external" href="http://trac.buildbot.net/"&gt;buildbot&lt;/a&gt; 0.8.5, set up
&lt;a class="reference external" href="https://github.com/mozilla/mozbase/blob/master/test-manifest.ini"&gt;tests&lt;/a&gt;
and a simple
&lt;a class="reference external" href="https://github.com/mozilla/mozbase/blob/master/test.py"&gt;test runner&lt;/a&gt;
for mozbase, and deployed a test instance to &lt;tt class="docutils literal"&gt;k0s.org&lt;/tt&gt;.  You can see
the waterfall here: &lt;a class="reference external" href="http://k0s.org:8010"&gt;http://k0s.org:8010&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;While buildbot comes with a
&lt;a class="reference external" href="https://github.com/buildbot/buildbot/blob/master/master/buildbot/changes/gitpoller.py"&gt;gitpoller&lt;/a&gt;
the version in
&lt;a class="reference external" href="http://pypi.python.org/pypi/buildbot/0.8.5"&gt;buildbot 0.8.5&lt;/a&gt;
(the current in &lt;a class="reference external" href="http://pypi.python.org/"&gt;http://pypi.python.org/&lt;/a&gt; ) did not work with
git 1.6.3, the version on &lt;tt class="docutils literal"&gt;k0s.org&lt;/tt&gt;.  Since my box is on an ancient
version of Ubuntu (and is remote and not trivially upgradable), I
brought the generic
&lt;a class="reference external" href="http://k0s.org/mozilla/hg/autobot/file/tip/autobot/changes/poller.py"&gt;autobot poller&lt;/a&gt;
from being buildbot 0.8.3 compatible to 0.8.5 compatible
(which is worth noting is not trivial).
Also, while there has been
&lt;a class="reference external" href="http://trac.buildbot.net/ticket/76"&gt;a patch for an hgpoller&lt;/a&gt;
submitted by
&lt;a class="reference external" href="http://www.mozilla.org/"&gt;Mozilla&lt;/a&gt;
developers some four years ago, it has been be &lt;tt class="docutils literal"&gt;WONTFIX&lt;/tt&gt; ed, so I
went ahead with a generic polling architecture which (IMHO) seems a
wiser architectural choice.  While I sympathize with the architectural
ideology of using a push-based architecture, and believe this is
closer to ideal, polling will always work and does not require access
to the repository servers which is a huge factor when using
&lt;a class="reference external" href="https://github.com"&gt;https://github.com&lt;/a&gt; or even Mozilla &lt;tt class="docutils literal"&gt;hg&lt;/tt&gt; repositories. (Incidentally,
I found neither this patch nor
&lt;a class="reference external" href="http://hg.mozilla.org/build/buildbotcustom/file/tip/changes/hgpoller.py"&gt;http://hg.mozilla.org/build/buildbotcustom/file/tip/changes/hgpoller.py&lt;/a&gt;
to work OOTB, so, sadly, I proceeded to roll my own.  Also
incidentally, it is not trivial to depend on &lt;tt class="docutils literal"&gt;buildbotcustom&lt;/tt&gt; using
&lt;tt class="docutils literal"&gt;install_requires&lt;/tt&gt; due to its lack of a &lt;tt class="docutils literal"&gt;setup.py&lt;/tt&gt; file.)
After debugging the &lt;tt class="docutils literal"&gt;gitpoller&lt;/tt&gt; I pushed
&lt;a class="reference external" href="http://k0s.org:8010/changes/1"&gt;a test change&lt;/a&gt; and was happy to see
that autobot built correctly. Autobot now listens to MozBase changes!&lt;/p&gt;
&lt;p&gt;I was unable to finish the (parenthetical)
&lt;a class="reference external" href="https://wiki.mozilla.org/Auto-tools/Goals/2011Q4#Mozbase"&gt;Q4 goal&lt;/a&gt;
of having autobot report to
&lt;a class="reference external" href="https://wiki.mozilla.org/Auto-tools/Projects/Autolog"&gt;autolog&lt;/a&gt; , so
this remains outstanding work.  There is a lot that could be done with
autolog.  The basic idea and TODOs are outlined in the
&lt;a class="reference external" href="http://k0s.org/mozilla/hg/autobot/file/tip/README.txt"&gt;README&lt;/a&gt;
(which itself could use some work; it is largely up to date
except the &lt;em&gt;Projects&lt;/em&gt; section, though incomplete).  I will endeavor to
work on this in my available time or as need escalates, but my
priority for
&lt;a class="reference external" href="https://wiki.mozilla.org/Auto-tools/Goals/2012Q1#GOAL:_Complete_Phase_I_of_Talos_Signal_From_Noise"&gt;2012 Q1&lt;/a&gt;
will be separating
&lt;a class="reference external" href="https://wiki.mozilla.org/Auto-tools/Projects/Signal_From_Noise"&gt;Talos Signal From Noise&lt;/a&gt;
so it is unlikely I will be able to put a lot of time into autobot
(sadly).  On the other hand, I am more than willing to help
and advise if anyone
wants any features or to iron out the crinkles.  While the
architecture is not completely straight forward, it is a decent
approximation to a
&lt;a class="reference external" href="http://en.wikipedia.org/wiki/Convex_hull"&gt;convex hull&lt;/a&gt;
over the
&lt;a class="reference external" href="http://en.wikipedia.org/wiki/Problem_domain"&gt;problem space&lt;/a&gt;
of having simple to write, simple to maintain, simple to debug
continuous integration for small(er) projects.  As usual, if anyone
wanted to seek out alternate solutions, that is fine too, but I am
essentially happy with my architecture decisions and technology
choices.&lt;/p&gt;
&lt;p&gt;Regardless of whether the CI solution for MozBase is autobot or
(other), it is important to remember that continuous integration is a
safety net and not a first line of defense.  It is regrettable that
autobot has no more notifications (yet) than the
&lt;a class="reference external" href="http://k0s.org:8010/waterfall"&gt;waterfall display&lt;/a&gt;
and the &lt;tt class="docutils literal"&gt;autobot&lt;/tt&gt; character lurking in
&lt;a class="reference external" href="irc://irc.mozilla.org/#ateam"&gt;#ateam&lt;/a&gt; (the default
&lt;a class="reference external" href="http://buildbot.net/buildbot/docs/0.8.5/manual/cfg-statustargets.html#irc-bot"&gt;IRC bot&lt;/a&gt;
isn't very verbal OOTB and I haven't had time to customize
it).  But I think having some (admittedly smokescreen) automated testing
for MozBase is an important step towards the evolution of the software
as well as towards development practices in general.&lt;/p&gt;
&lt;/div&gt;</description><author>k0s</author><guid isPermaLink="true">http://k0s.org/mozilla/blog/20120101165149</guid><pubDate>Sun, 01 Jan 2012 16:51:49 GMT</pubDate></item></channel></rss>