wiki:Meetings/MacPortsMeeting2018/BuildbotRestructuring

BuildBot Restructuring

Original Problem

When designing the new buildbot at the last MacPorts Meeting in 2016, we tried to achieve the following picture for each commit:

 +---------------+
 |    commit     |
 +---------------+
 |    prepare    |
 |   resources   |
 +---------------+
 |   port dep1   |
 +---------------+
 |   port dep2   |
 +---------------+
 |      ...      |
 +---------------+

Where port dep1, port dep 2, etc. are dependencies of the port actually changed in the commit and are scheduled in dependency order. We did not end up doing this, because it requires the ability to dynamically add steps, which buildbot 0.8 does not support yet. We would have had to switch to buildbot >= 0.9 for this to work.

We then attempted to emulate the same by introducing the portwatcher and triggering builds from the portwatcher on a separate portbuilder builder, to emulate the same picture:

 +--------------+
 |   commit     |
 +--------------+
 |   scheduler  |
 +--------------+
 |   prepare    |         +---------------+
 |  resources   |  ---->  |   port dep1   |
 +--------------+         +---------------+
                          |   port dep2   |
                          +---------------+
                          |   port dep3   |
                          +---------------+

Unfortunately this did not work either, because we could not get buildbot to schedule builds in the order we triggered them (#52766). To work around this, we introduced the mpbb install-dependencies step:

 +--------------+
 |   commit     |
 +--------------+
 |   prepare    |
 |  resources   |
 +--------------+         +---------------+
 |   scheduler  |  ---->  |  port1 deps   |
 +--------------+         +---------------+
                          |  port1 build  |
                          +---------------+
                          |  port2 deps   |
                          +---------------+
                          |  port2 build  |
                          +---------------+

Ryan's proposal

Ryan asked us to get rid of the separate portwatcher & portbuilder jobs and re-configure the remaining job to interleave the two types of actions. As we understood it, this was to solve the following problem:

  • A commit for portA comes in
  • portwatcher for this commit schedules a portbuilder job for portA
  • This build takes a long time
  • While the build is still running, a new commit for the same port arrives, which queues a portwatcher job
  • Another commit for the same port arrives, which queues another portwatcher job
  • When the portbuilder job finishes, a useless build is scheduled for portA.

He proposed the following to solve this:

 +---------------+
 |    commit1    |
 +---------------+
 |    prepare    |
 |   resources   |
 +---------------+
 |   scheduler1  |
 +---------------+
 +---------------+
 |   port1 dep1  |
 +---------------+
 +---------------+
 |   port1 dep2  |
 +---------------+
 +---------------+
 |    commit2    |
 +---------------+
 |    prepare    |
 |   resources   |
 +---------------+
 |   scheduler2  |
 +---------------+
 +---------------+
 |   port1 dep2  |
 +---------------+
 +---------------+
 |     port1     |
 +---------------+
 +---------------+
 |   port2 dep1  |
 +---------------+
 +---------------+
 |   port2 dep2  |
 +---------------+
 +---------------+
 |     port2     |
 +---------------+

Unfortunately that introduces the problem with the prepared shared resources (like the portindex, mpbb checkout, ports tree), because we really cannot change the portstree while we have build scheduled in a dependency order that was computed from the old ports tree. This would cause seemingly random and hard-to-debug problems if a follow-up commit changes dependencies of ports.

If we were to wait for the steps planned by the first scheduler to finish, we would end up in the same situation as outlined at the beginning, which was our initial approach, but only works with buildbot >= 0.9.

To avoid the problem outlined by Ryan, we could also just enable build merging on the portwatcher, which would avoid the spurious builds.

Alternative Solutions

We could also try setting up buildbot 1.0 and just fix the UI, rather than working around 0.8's limitations with hacks.

Also, #52766 is solved now, enabling us to schedule builds in-order, so we could also implement this approach again:

 +---------------+
 |    commit     |
 +---------------+
 |   scheduler   |
 +---------------+
 |    prepare    |         +---------------+
 |   resources   |  ---->  |   port dep1   |
 +---------------+         +---------------+
                           |   port dep2   |
                           +---------------+
                           |   port dep3   |
                           +---------------+

The approach with two separate builders is always affected by #53587 (a restart of the buildmaster will trigger pending portwatcher and portbuilder jobs at the same time), though.

Suggestions

Features to implement

  • We currently lack a job which runs portindex (mprsyncup).
  • We should once run an independent job to mirror distfiles of all ports rather than having to rely on individual builds to eventually trigger mirroring of everything. Even if that job takes a week to finish. After that we could probably simplify the mirroring to potentially skip fetching files of all dependencies.
  • We should merge the "port watcher" jobs on individual builders since there is absolutely no need to build them individually. That way we can save some CPU cycles in cases where multiple commits touched the same ports (while the buildbot was busy building all other ports).

Emails

While thinking about this, buildbot was probably not designed to do the kind of builds we are running on it. It turns out that many of the ideas that we try to implement get rejected solely on the basis that such a solution would result in hard-to-follow emails and our mailing script is already overly complex.

We figured out that it would probably make a lot more sense to implement our own mailer rather than trying to use ugly workarounds on the buildbot. This gives us more freedom in designing a more optimal workflow.

Layout

We propose to make the following layout schematic, either using individual builds like we did until now (but keeping just one builder per OS) or using dynamic steps from buildbot 1 and fit all the 1000 touched ports into n*1000 steps. The jobs on the global watcher would not wait for the individual builder to finish, but when a commit touches so many ports that it takes one day to build them all, we don't want to update to a newer commit in the meantime. We would however later merge all the hundred commits that accumulated during that day.

 +--------------+
 |-  WATCHER   -|
 +--------------+
 |  (commit)    |
 +--------------+
 | 1-mprsyncup  |
 +--------------+
 | 2-mirror     |
 +--------------+         +----------------+  +----------------+     +-----------------+
 | 3-scheduler  |  ---->  |- 10.6 builder -|  |- 10.7 builder -| ... |- 10.13 builder -|
 | (no waiting) |         +----------------+  +----------------+     +-----------------+
 +--------------+         | svn up         |  | svn up         |     | svn up          |
                          +----------------+  +----------------+     +-----------------+
                          | clean          |  | clean          |     | clean           |
                          +----------------+  +----------------+     +-----------------+
                          | selfupdate     |  | selfupdate     |     | selfupdate      |
                          +----------------+  +----------------+     +-----------------+
                          | port list      |  | port list      |     | port list       |
                          +----------------+  +----------------+     +-----------------+
                          +----------------+  +----------------+     +-----------------+
                          | dep1 install   |  | dep1 install   |     | dep1 install    |
                          +----------------+  +----------------+     +-----------------+
                          | dep1 archive   |  | dep1 archive   |     | dep1 archive    |
                          +----------------+  +----------------+     +-----------------+
                          | dep1 upload    |  | dep1 upload    |     | dep1 upload     |
                          +----------------+  +----------------+     +-----------------+
                          | dep1 deploy    |  | dep1 deploy    |     | dep1 deploy     |
                          +----------------+  +----------------+     +-----------------+
                          | clean          |  | clean          |     | clean           |
                          +----------------+  +----------------+     +-----------------+
                          +----------------+  +----------------+     +-----------------+
                          | dep2 install   |  | dep2 install   |     | port3 install   |
                          +----------------+  +----------------+     +-----------------+
                          | dep2 archive   |  | dep2 archive   |     | port3 archive   |
                          +----------------+  +----------------+     +-----------------+
                          | dep2 upload    |  | dep2 upload    |     | port3 upload    |
                          +----------------+  +----------------+     +-----------------+
                          | dep2 deploy    |  | dep2 deploy    |     | port3 deploy    |
                          +----------------+  +----------------+     +-----------------+
                          +----------------+  +----------------+
                          | port1 install  |  | port1 install  |
                          +----------------+  +----------------+
                          | port1 archive  |  | port1 archive  |
                          +----------------+  +----------------+
                          | port1 upload   |  | port1 upload   |
                          +----------------+  +----------------+
                          | port1 deploy   |  | port1 deploy   |
                          +----------------+  +----------------+

Caveats

Special attention needs to be put into cases where unrelated ports are being updated in the same commit. Here is an example for such a case, all the ports A B C D where updated in the same commit. One possible topological build order could be: D A B C.

A -> D
B ---´
C

Imaging that the build for D failed, we would still want to attempt to build C. Only the builds for A and B may be cancelled/skipped.

If using builds, A and B would try to run, but then fail as D is not available. If using dynamic steps, the build would fail on the first failing step (by default).

Possible solutions

For this to work correctly with buildbot, this dependency hierarchy has to be available in Python data structures to be able to:

  • cancel pending builds on build failure, or
  • add dependencies to builds, or
  • execute as steps with doStepIf (checking for execution status of dependencies)
Last modified 21 months ago Last modified on Mar 15, 2018, 5:43:38 PM