[tahoe-dev] buildslave operators: need to reconfigure your slaves

Brian Warner warner at lothar.com
Mon Nov 7 23:54:29 UTC 2011


On 11/4/11 6:26 PM, Kyle Markley wrote:

> The URL actually points to a path that doesn't exist. The zfec source
> isn't on the web server.
> 
> When I try to build tahoe-lafs instead of zfec, I get a similar error,
> but for this path:
> https://tahoe-lafs.org/source/tahoe-lafs/trunk/_darcs/inventory
> 
> My web browser gives me a 404 on that.  The _darcs directory exists, but
> there's no entry 'inventory'.

As far as we can tell, that error is misleading, and the root cause is
Darcs' SSL library (probably libcurl, linked against OpenSSL, using an
OS-provided list of Certificate Authorities) not recognizing the CA
which accepted our protection money, erm, signed the tahoe-lafs.org SSL
certificate :).

The tahoe-lafs.org server redirects all HTTP (port 80) requests to their
equivalent HTTPS (port 443) URLs (and sets a Strict-Transport-Security
header to keep them there).

Buildslaves from midnightmagic, Kyle, and sickness all fail with:

  darcs failed: Not a repository:
  https://tahoe-lafs.org/source/tahoe-lafs/trunk (Failed to download URL
  https://tahoe-lafs.org/source/tahoe-lafs/trunk/_darcs/inventory: Peer
  certificate cannot be authenticated with known CA certificates)

And the buildslaves from Eugen and Freestorm fail with:

  darcs failed: Not a repository:
  https://tahoe-lafs.org/source/tahoe-lafs/trunk (Failed to download URL
  https://tahoe-lafs.org/source/tahoe-lafs/trunk/_darcs/inventory: HTTP
  301 error getting
  https://tahoe-lafs.org/source/tahoe-lafs/trunk/_darcs/inventory)

When a client does 'darcs get', the first thing it does is an
exploratory HTTP GET of URL/_darcs/inventory, which was used by very old
versions of darcs and is not present in modern repositories (like the
Tahoe repo). When that fails, it proceeds to look for /_darcs/format,
from which it learns the proper repo layout and can fetch the modern
things like /_darcs/hashed_inventory . So it's ok and expected that the
/_darcs_inventory fetch will fail (with a 404), and darcs will tolerate
it and proceed on to the other markers. But some other error may occur,
in which case might just stop immediately. It'd be less confusing if it
were to ignore *all* errors on /_darcs/inventory and then report
whatever error it gets on /_darcs/format (which *is* supposed to exit).

What we pieced together last thursday is that certain versions of
libcurl don't do SSL. In fact, they're so enthusiastic about not doing
SSL that every time you give them an HTTPS URL, they'll just silently
ignore the "S" and do a regular unencrypted port 80 connection. I think
we managed to catch it with tcpdump (but if dvanduzer or midnightmagic
could confirm that'd be great). It's easy to imagine a code path that
would cause this error (a switch statement on the "scheme" field that
lets https fall through to the same call as http, probably guarded with
a fragile little "TODO: figure out SSL" comment).

We think that Eugen and Freestorm's slaves are using this old version.
They tell libcurl to hit
https://tahoe-lafs.org/source/tahoe-lafs/trunk/_darcs/inventory, libcurl
de-s-ifies the URL to http:.., the webserver says "hey, what part of
'SSL' do you not understand?", redirects them to https:.., libcurl
de-s-ifies the redirect target too, gets ready to hit http:.. again,
notices the new URL is the same as the old URL, stops (to avoid looping
forever), and then reports the most recent "error" (the 301 Redirect)
instead of something useful like "I don't believe in SSL and so I
stripped the S from your URL and so it looked like a loop and so I
stopped".

Meanwhile, we think the other slaves (midnightmagic, Kyle, sickness)
have versions of libcurl that use OS-provided CA lists (in /etc/ssl/ or
/etc/ssl/certs/) which don't include GeoTrust (our CA). Almost all
browsers include the GeoTrust CA cert, but it seems that for non-browser
purposes, the OS vendors provide a much smaller set of roots. The
/_darcs/inventory fetch fails with an SSL error, and darcs stops right
away (even though if it had failed with a 404, it would have continued
on to /_darcs/format).


So we're still trying to figure things out. We'd like to find out what
is in /etc/ssl/ on the failing machines (openbsd, netbsd, centos5,
debian/lenny), and also on the succeeding machines (ubuntu, freebsd),
and confirm our hypothesis that the difference is a missing GeoTrust
cert. The fix isn't clear either: if we can identify a CA which is
present in all the OS's stock /etc/ssl/ lists, then maybe we could buy a
certificate from them instead of the GeoTrust one. It's technically
possible to have our developers/users/buildslave-admins install the
tahoe-lafs.org cert directly, or even the GeoTrust CA cert (the libcurl
equivalent of clicking through a browser's self-signed-cert warning),
but that'd be a big barrier to new folks who just want to grab the
source. Or maybe we'll find out that something else is wrong, maybe a
misconfiguration of our nginx server, and it'll prove possible to make
it work without modifying the client systems at all.

If anyone has any experience with this, please let us know!

cheers,
 -Brian


More information about the tahoe-dev mailing list