Opened 8 years ago

Closed 6 years ago

Last modified 6 years ago

#49812 closed defect (duplicate)

buildbots: compile failures due to stalled bzr fetches

Reported by: dbevans (David B. Evans) Owned by: admin@…
Priority: Normal Milestone:
Component: server/hosting Version:
Keywords: Cc:
Port:

Description

Recent attempts to build inkscape-devel are causing the buildbots to abort with a compile failure. Examination of the stdio log file shows that builds stall waiting for the bzr fetch (which is admittedly slow) to complete and eventually throw the compile failure.

This is new behavior which has only begun recently. The first svn commit to fail was r142782 dated 2015-11-22 23:34:24 -0800. Subsequent svn commit r142868 also failed. Last successful commit was r142636 dated 2015-11-17 18:36:45 -0800. Builds on my local machine fetch successfully within 2 to 3 minutes in all cases.

Log from the latest failed attempt is https://build.macports.org/builders/buildports-yosemite-x86_64/builds/6367/steps/compile/logs/stdio/text which ends abruptly with

CC_PRINT_OPTIONS='YES'
CC_PRINT_OPTIONS_FILE='/opt/local/var/macports/build/_opt_mports_dports_graphics_inkscape-devel/inkscape-devel/work/.CC_PRINT_OPTIONS'
CPATH='/opt/local/include'
LIBRARY_PATH='/opt/local/lib'
MACOSX_DEPLOYMENT_TARGET='10.10'
DEBUG: Assembled command: 'cd "/opt/local/var/macports/build/_opt_mports_dports_graphics_inkscape-devel/inkscape-devel/work" && /opt/local/bin/bzr --builtin --no-aliases checkout --lightweight -r 14489 lp:inkscape inkscape-devel-0.91.99.14489'
DEBUG: Executing command line:  cd "/opt/local/var/macports/build/_opt_mports_dports_graphics_inkscape-devel/inkscape-devel/work" && /opt/local/bin/bzr --builtin --no-aliases checkout --lightweight -r 14489 lp:inkscape inkscape-devel-0.91.99.14489 2>&1
Not checking SSL certificate for xmlrpc.launchpad.net.
You have not informed bzr of your Launchpad ID, and you must do this to
write to Launchpad or access private data.  See "bzr help launchpad-login".

as do the others. Initial warnings from bzr are normal and innocuous.

I'm wondering if you have access to more detailed network logs that would shed some light on what's happening (or not). Thanks.

Change History (24)

comment:1 Changed 8 years ago by dbevans (David B. Evans)

BTW, I've tried doing an http tarball fetch from Launchpad but for a given bzr revision each fetch yields a different checksum.

comment:2 Changed 8 years ago by ryandesign (Ryan Carsten Schmidt)

I'm not aware of the existence of any network logs.

See also #49759 for a similar inexplicable git clone on the buildbot builders.

I propose we fix the problem as explained there: slow network fetches should be permitted. MacPorts should print periodic output during slow fetches so that it is apparent something is happening and things are not stalled.

comment:3 Changed 8 years ago by dbevans (David B. Evans)

Agreed. In the bzr case, it should be possible to enable a more verbose transfer mode. Currently, it is completely silent until the transfer is complete so it's hard to tell if something is happening or not. Do the buildbots have a set timeout period when waiting for output on stdio and if so what is it? Are there specific network protocols that are blocked at Apple?

comment:4 Changed 8 years ago by jmroot (Joshua Root)

The timeout is 20 minutes.

comment:5 in reply to:  4 ; Changed 8 years ago by dbevans (David B. Evans)

Replying to jmr@…:

The timeout is 20 minutes.

This should be sufficient time for the bzr fetches to complete. Maximum fetch time on local machines is 5 minutes or less.

comment:6 in reply to:  3 ; Changed 8 years ago by ryandesign (Ryan Carsten Schmidt)

Replying to devans@…:

Are there specific network protocols that are blocked at Apple?

All ports that are not explicitly allowed are blocked. I understand bzr uses port 4155. That is not on the list of ports that are documented to be allowed, but I don't know if the documentation is up to date. If you give me the hostname or IP address of a bzr server I can check if it can be accessed from the servers.

comment:7 Changed 8 years ago by dbevans (David B. Evans)

Based on using tcpdump to watch the network traffic during the fetch, the Launchpad bzr host name is taotie.canonical.com. I saw no traffic on port 4155. All transfers were strictly http (port 80).

Last edited 8 years ago by dbevans (David B. Evans) (previous) (diff)

comment:8 Changed 8 years ago by ryandesign (Ryan Carsten Schmidt)

Port 80 is definitely open...

comment:9 Changed 8 years ago by ryandesign (Ryan Carsten Schmidt)

Owner: changed from ryandesign@… to admin@…

comment:10 in reply to:  6 ; Changed 8 years ago by ryandesign (Ryan Carsten Schmidt)

Replying to ryandesign@…:

All ports that are not explicitly allowed are blocked. I understand bzr uses port 4155. That is not on the list of ports that are documented to be allowed, but I don't know if the documentation is up to date.

I think the documentation was out of date. Based on prior internal tickets filed, all buildbot builders should have ports 80 (http), 443 (https), 2401 (cvs), 3690 (svn), 4155 (bzr), 9418 (git) open.

Replying to devans@…:

BTW, I've tried doing an http tarball fetch from Launchpad but for a given bzr revision each fetch yields a different checksum.

This sounds like a launchpad bug. Can you report it to them?

comment:11 in reply to:  10 Changed 8 years ago by dbevans (David B. Evans)

Replying to devans@…:

BTW, I've tried doing an http tarball fetch from Launchpad but for a given bzr revision each fetch yields a different checksum.

This sounds like a launchpad bug. Can you report it to them?

Will do.

comment:12 in reply to:  5 Changed 8 years ago by dbevans (David B. Evans)

Replying to devans@…:

Replying to jmr@…:

The timeout is 20 minutes.

This should be sufficient time for the bzr fetches to complete. Maximum fetch time on local machines is 5 minutes or less.

Actually, I need to update this statement as well. I recently ran a sequence of 10 consecutive inkscape-devel bzr fetches and the time required varied all over the map. Minimum was about 3 minutes with a maximum over 19 minutes. In addition, when running with tcpdump as mentioned above the fetch time was also very near 20 minutes.

Given these results, it's quite possible that a fetch on a buildbot VM might well exceed the 20 minute limit. Is it possible to run one or more inkscape-devel fetch by hand on one of the builbots and see how much time it really takes?

comment:13 Changed 8 years ago by ryandesign (Ryan Carsten Schmidt)

I tested on a gigabit Internet connection in Austin and it fetched 198MB in under 2 minutes. I've taken the Snow Leopard builder offline to test fetching there with the --verbose flag, and so far it's fetching at around 50KB/s; at that rate, it should take around an hour to fetch it all.

So as I said in #49759 for git, we should modify MacPorts base so that bzr fetches print something while fetching. It looks like the --verbose flag only works in a terminal, so I'm not sure how to make bzr print progress information when called by MacPorts.

comment:14 Changed 8 years ago by ryandesign (Ryan Carsten Schmidt)

If we could hook into the MacPorts progress bar API for all revision control system fetches (bzr, cvs, git, hg, svn) that would be super.

comment:15 in reply to:  13 Changed 8 years ago by ryandesign (Ryan Carsten Schmidt)

Replying to ryandesign@…:

I've taken the Snow Leopard builder offline to test fetching there with the --verbose flag, and so far it's fetching at around 50KB/s; at that rate, it should take around an hour to fetch it all.

It finished in about 32 minutes.

comment:16 in reply to:  13 ; Changed 8 years ago by dbevans (David B. Evans)

Replying to ryandesign@…:

I tested on a gigabit Internet connection in Austin and it fetched 198MB in under 2 minutes. I've taken the Snow Leopard builder offline to test fetching there with the --verbose flag, and so far it's fetching at around 50KB/s; at that rate, it should take around an hour to fetch it all.

I just did another fetch here and the download rate varied from a low of 25KB/s to a max around 256KB/s -- nothing more. So I'm assuming Launchpad has a fairly slow connection or their host is really loaded down. In addition, it looks like the bzr protocol is fairly inefficent -- lots of small packets with handshakes in between which doesn't help either.

So as I said in #49759 for git, we should modify MacPorts base so that bzr fetches print something while fetching. It looks like the --verbose flag only works in a terminal, so I'm not sure how to make bzr print progress information when called by MacPorts.

Worst yet the bzr checkout command doesn't accept --verbose according to the docs. Although it does show a progress bar when connected to a terminal.

comment:17 in reply to:  14 Changed 8 years ago by dbevans (David B. Evans)

Replying to ryandesign@…:

If we could hook into the MacPorts progress bar API for all revision control system fetches (bzr, cvs, git, hg, svn) that would be super.

Agreed.

comment:18 in reply to:  16 Changed 8 years ago by ryandesign (Ryan Carsten Schmidt)

Replying to devans@…:

Worst yet the bzr checkout command doesn't accept --verbose according to the docs.

The manpage says it does:

   bzr checkout [BRANCH_LOCATION] [TO_LOCATION]
       Options:
           --files-from ARG          Get file contents from this tree.
           --hardlink                Hard-link working tree files where
                                     possible.
           --help, -h                Show help message.
           --lightweight             Perform a lightweight checkout.
                                     Lightweight checkouts depend on access
                                     to the branch for every operation.
                                     Normal checkouts can perform common
                                     operations like diff and status without
                                     such access, and also support local
                                     commits.
           --quiet, -q               Only display errors and warnings.
           --revision ARG, -r        See "help revisionspec" for details.
           --usage                   Show usage message and options.
           --verbose, -v             Display more information.

Although it does show a progress bar when connected to a terminal.

Can we trick it into thinking it has a terminal?

comment:19 Changed 8 years ago by dbevans (David B. Evans)

Well, I don't know what documentation I was looking at but you're right. However, running the checkout command with --verbose doesn't give any more than the normal.

Perhaps we could patch bzr to output not the running progress bar but a periodic status message if its not connected to a terminal. Depends on how the code works, I guess.

comment:20 Changed 8 years ago by dbevans (David B. Evans)

And now inkscape-devel r143503 just fetched successfully on all active buildbots.

comment:21 Changed 8 years ago by dbevans (David B. Evans)

At this point inkscape-devel bzr fetches seems to be working without problems and have been for several months. So perhaps this is a moot issue? Shall we close this ticket or do you want to keep it open.

comment:22 Changed 8 years ago by dbevans (David B. Evans)

Resolution: worksforme
Status: newclosed

No response. End to end network performance (root cause of this issue) seems to have improved significantly and is much less variable since this ticket was filed. Bzr fetches of imkscape-devel are no longer timing out and complete within a reasonable period of time. Closing as works for me.

Version 0, edited 8 years ago by dbevans (David B. Evans) (next)

comment:23 Changed 6 years ago by ryandesign (Ryan Carsten Schmidt)

Resolution: worksforme
Status: closedreopened

Since this ticket was closed, we've moved the buildbot to new hosting that has a slower network connection than what we enjoyed at macOS forge. inkscape-devel has switched from bzr to git now, but the problem can happen with git too:

https://build.macports.org/builders/ports-10.13_x86_64-builder/builds/12075/steps/install-port/logs/stdio

I've reopened #49759 to address it.

comment:24 Changed 6 years ago by ryandesign (Ryan Carsten Schmidt)

Resolution: duplicate
Status: reopenedclosed
Note: See TracTickets for help on using tickets.