#2055 new defect

Building tahoe safely is non-trivial

Reported by: leif Owned by: daira
Priority: major Milestone: soon
Component: packaging Version: 1.10.0
Keywords: install security eggs pip setuptools packaging Cc: killyourtv@…
Launchpad Bug:

Description

Summary: to safely build Tahoe on an untrustworthy (read: any) network it currently seems necessary to take an unintuitive step such as setting up a restrictive firewall or simply disconnecting from the internet in order to prevent setup.py from downloading and running arbitrary code via http.

In this ticket I describe the two approaches I've tried: virtualenv v1.9.1 (w/ pip v1.3), and the "Desert Island" build. If appropriate precautions are taken, both methods can yield what I believe are relatively "safe" builds (that is to say, they at least use HTTPS (and require CA-signed certificates) to ensure the integrity of the downloaded dependencies).

The former requires blocking pip's port 80 connections and the latter requires disconnecting from the internet during the build.

virtualenv+pip

Ideally, pip install allmydata-tahoe would be an easy and safe command to run!

Version 1.3 of pip finally added certificate verification when making https connections, but when installing allmydata-tahoe v1.10 it still attempts to fetch foolscap and pycrypto via HTTP first. If that fails, perhaps because you've configured a firewall to not allow port 80 connections, it will fall back to downloading them from PyPI via HTTPS.

Note that using virtualenv 1.9 and pip 1.3, pip install allmydata-tahoe fails unless pip install twisted is run first. This might be because the former installs Twisted 11.0 while the latter installs Twisted 13.0.

The "Desert Island" Build

On the AdvancedInstall wiki page there are instructions for a "Desert Island" build, which consists of downloading and extracting https://tahoe-lafs.org/source/tahoe-lafs/deps/tahoe-deps.tar.gz in the tahoe-lafs source directory and running "python setup.py build".

While this does work fine without an internet connection, it still tries repeatedly to connect to the internet. These are the lines of "python setup.py build" output which contain "Reading http":

Reading http://pypi.python.org/simple/zope.interface/
Reading http://pypi.python.org/simple/
Reading https://tahoe-lafs.org/source/tahoe-lafs/deps/tahoe-lafs-dep-sdists/
Reading https://tahoe-lafs.org/source/tahoe-lafs/deps/tahoe-lafs-dep-eggs/
Reading http://pypi.python.org/simple/mock/
Reading http://pypi.python.org/simple/
Reading http://pypi.python.org/simple/pyasn1/
Reading http://pypi.python.org/simple/pycrypto/
Reading http://pypi.python.org/simple/Nevow/
Reading http://pypi.python.org/simple/pyOpenSSL/
Reading http://pypi.python.org/simple/foolscap/
Reading http://pypi.python.org/simple/simplejson/
Reading http://pypi.python.org/simple/zfec/
Reading http://pypi.python.org/simple/pyutil/
Reading http://pypi.python.org/simple/zbase32/

Here is the context around one of them on my offline system (the others are similar):

Reading http://pypi.python.org/simple/foolscap/
Download error: [Errno -2] Name or service not known -- Some packages may not be found!
Couldn't retrieve index page for 'foolscap'
Best match: foolscap 0.6.4
Processing foolscap-0.6.4.tar.gz
Running foolscap-0.6.4/setup.py -q bdist_egg --dist-dir /tmp/easy_install-vIEtM6/foolscap-0.6.4/egg-dist-tmp-Ipvbv_
zip_safe flag not set; analyzing archive contents...
foolscap.test.test_appserver: module references __file__
Adding foolscap 0.6.4 to easy-install.pth file
Installing flappserver script to support/bin
Installing flappclient script to support/bin
Installing flogtool script to support/bin

Installed /fake-path-to-my-source-checkout/tahoe-lafs/support/lib/python2.7/site-packages/foolscap-0.6.4-py2.7.egg

I'm assuming (but have not confirmed) from the "Best match" part of this output that if any of these attempted requests were successful and the response indicated that there is a newer version of one of the dependencies than the corresponding egg in tahoe-deps, it would actually download and execute that code.

Change History (24)

comment:1 Changed at 2013-08-08T20:09:05Z by leif

I've updated the desert island build instructions on AdvancedInstall to indicate that it is currently necessary to disconnect from the internet to have a truly offline build.

comment:2 Changed at 2013-08-09T00:56:12Z by ioerror

This is a relevant thread on the tahoe-dev mailing list:

https://tahoe-lafs.org/pipermail/tahoe-dev/2013-August/008643.html

comment:3 Changed at 2013-08-09T01:17:28Z by killyourtv

FWIW, I set the http_proxy and https_proxy environment variables to bogus values when I want to perform an offline build. The installation will try to (and will be unable to) go out to the internet to fetch newer dependencies.

I'm assuming (but have not confirmed) from the "Best match" part of this output that if any of these attempted requests were successful and the response indicated that there is a newer version of one of the dependencies than the corresponding egg in tahoe-deps, it would actually download and execute that code.

Based on my experiences your assumption is correct.

comment:4 Changed at 2013-08-09T01:17:38Z by killyourtv

  • Cc killyourtv@… added

comment:5 Changed at 2013-08-09T02:41:05Z by leif

Thanks, killyourtv. I feel kind of terrible now, as your comment made me realize that even after my careful research writing this ticket I actually just published a script that was still unsafely installing tahoe. :(

I did much of the testing in an environment with Tor configured to refuse all connections on port 80, but in the first version of my tails bootstrap script which I published a couple hours ago I was foolishly operating under the assumption that setup.py on Tails wasn't able to connect to the internet because I saw some "Connection refused" lines. It turns out, Tails 0.19 sets the http_proxy environment variable but NOT https_proxy, so the errors I was seeing were only about the https connections. And, tahoe's setup.py only prints URLs when they fail. :(

To anyone who ran that first version of the script, I apologize. Hopefully there aren't malicious Tor exits serving higher-numbered versions of Tahoe dependencies than tahoe-deps.tar.gz has. :(

comment:6 Changed at 2013-08-09T12:56:28Z by markberger

Maybe something like peep would solve this? Peep is just a wrapper around pip that will verify tarballs against a hash you give it. If any of the hashes mismatch, peep will abort the installation.

comment:7 follow-up: Changed at 2013-08-09T13:01:51Z by zooko

Is there a sufficiently convenient way to ask your operating system to deny networking to a given subprocess while still allowing it for your other processes? That would be useful not only for building Tahoe-LAFS, but any other package that you wanted to build. It is important to note that this would not be an attempt to prevent a malicious process from communicating, it would only be preventing an honest but imprudent process from downloading packages.

comment:8 in reply to: ↑ 7 ; follow-up: Changed at 2013-08-09T17:17:18Z by leif

Replying to markberger:

Maybe something like peep would solve this? Peep is just a wrapper around pip that will verify tarballs against a hash you give it. If any of the hashes mismatch, peep will abort the installation.

I had not heard of peep before, but after skimming over its mere 224 lines of code just now I think I like it! One thing that isn't clear to me though is the process by which a user or developer is supposed to become aware of new versions of libraries and decide to use them.

Replying to zooko:

Is there a sufficiently convenient way to ask your operating system to deny networking to a given subprocess while still allowing it for your other processes? That would be useful not only for building Tahoe-LAFS, but any other package that you wanted to build. It is important to note that this would not be an attempt to prevent a malicious process from communicating, it would only be preventing an honest but imprudent process from downloading packages.

That depends on your definition of sufficiently convenient :)

There are LD_PRELOAD tools (such as usewithtor/torsocks/tsocks) which catch most things and redirect them to a socks proxy but they aren't 100% reliable. Some programs (even non-malicious ones) might make connections in ways those tools don't catch. Also, they're another binary dependency.

Linux's netfilter firewall can do everything, but we obviously don't want to require Linux or root access to build tahoe. But if anyone is interested, you can have per-user firewall rules which I believe are as reliable as the rest of Linux's privilege separation. On modern Debian or Ubuntu systems, you can use the iptables frontend ufw. It is as easy as "sudo apt-get install ufw", adding a line like "-A ufw-before-output -m owner --uid-owner offline-user -j REJECT" somewhere before the last line in "/etc/ufw/before.rules", running "sudo ufw disable; sudo ufw enable", and su'ing to the "offline-user" user. Another way to use ufw is to edit /etc/default/ufw and change DEFAULT_OUTPUT_POLICY from ACCEPT to REJECT, and then add rules to before.rules allowing a certain user to connect, and then running tor or another proxy or VPN as that user. Then you can use your proxy or VPN to restrict what kinds of connections are allowed.

I'm not really in favor of making the official build process use firewall or LD_PRELOAD tricks, though, as there are of course much better ways to do an offline build.

Today I learned that pip has a --no-download option! So, the short-term thing I'd like to see, and which I might try to do myself on my branch in the near future is to migrate the build process to use virtualenv (which includes pip, and weighs in at 2MB compressed) and include that in the repository instead of the zetuptoolz fork of setuptools from 2010 which is in there now. The next step is to either use peep or make sure that when the deps are not already present pip can only ever learn about HTTPS URLs to download them.

The longer-term thing I'd like to see is deterministic builds (#2057)! In my ideal world everyone would be able to build identical debs, tarballs, exes, and dmgs. Of course, part of that involves specifying precise versions of all dependencies. Another part is building in a VM (gitian automates that) which would certainly make it easier to be confident that the build process can't get online. I haven't looked very closely at gitian yet, but I'm under the impression that it will be quite a bit of work to get to that point.

comment:9 in reply to: ↑ 8 Changed at 2013-08-09T17:20:30Z by zooko

Leif: your comment covers enough different (related) topics that I think it should be a post to the tahoe-dev thread instead of just a comment. (Then maybe some parts of it should be some comments on a few different tickets…)

comment:10 Changed at 2014-03-17T13:43:35Z by dstufft

Oldish ticket, but it was linked to me today!

So here's some information about various versions of packaging tools and what they support wrt HTTPS.

pip < 1.3 - YOLO with HTTP all around pip 1.3 - Hits PyPI using HTTPS (does not fall back to HTTP), however it automatically scrapes things located on a packages /simple/foo/ page on PyPI, which may be hosted over HTTP, additionally if anything uses a setup_requires that is downloaded+installed by setuptools not pip, additonally if a package has dependency_links then pip will also scrape those which may be hosted via HTTP, uses an old copy of root certificates that were incorrectly taken from mozilla's trust root and are old. pip 1.4 - Mostly the same as 1.3, however it adds the ability to disable scraping external site to PyPI, uses an old copy of root certificates that were incorrectly taken from mozilla's trust root and are old. pip 1.5 - Switches the options in 1.4 to on by default, pip no longer scrapes sites other than PyPI by default, additionally disables processing dependency links by default. With the default configuration the only non HTTPS network access can come from setup_requires. Uses an up to date (at time of release) bundled ca bundle that was properly taken from Mozilla (via a tool agl wrote). pip 1.6 (future/proposed) - Removes the ability to enable dependency links at all, takes control of setup_requires so that setuptools no longer has any control over it and pip install <something> by default is only over verified HTTPS unless the user invoking pip explicitly uses a HTTP url somewhere.

setuptools < 0.7 - YOLO with HTTP all around setuptools >= 0.7 - Will use HTTPS to hit PyPI, may or may not acually be active because it attempts to discover certificates and I believe it fails open, installing depends on an old version of certifi which incorrectly uses the mozilla cert bundle and is outdated. Can still use HTTP if listed on a project /simple/foo/ or inside of a dependency link. No way to specify it must be loaded over HTTPS but can restrict which hosts are used.

FWIW pip --no-download is bad and you shouldn't use it. If you want to do that you should donwload the packages to a directory (you can use pip install --download <directory> [package [package ...]] for that) and then use pip install --no-index -find-links <directory> [package [package ...]].

You can tell easy_install/setuptools not to hit the network by telling it the allowed hosts are 'None' (http://pythonhosted.org/setuptools/easy_install.html#restricting-downloads-with-allow-hosts).

comment:11 Changed at 2014-03-17T18:22:19Z by daira

  • Keywords pip setuptools added

comment:12 Changed at 2014-03-17T18:23:07Z by daira

See also #2077.

comment:13 Changed at 2014-06-30T20:46:25Z by zooko

Here's a useful summary of the situation from dstufft:

pipermail/tahoe-dev/2014-June/009106.html

Sounds like a good next step is to visit the transitive closure of tahoe-lafs and its dependencies and see if we can remove all the setup_requires dependencies.

comment:14 follow-up: Changed at 2014-06-30T21:54:20Z by zooko

Removing the setup_requires would also fix #2066.

comment:15 in reply to: ↑ 14 Changed at 2014-07-02T03:58:04Z by nejucomo

Replying to zooko:

Removing the setup_requires would also fix #2066.

Fixing #2066 would be easier if we require a newer Nevow which is now available: ticket:2032#comment:18

comment:16 Changed at 2014-09-11T22:23:51Z by warner

  • Component changed from unknown to packaging

comment:17 Changed at 2014-11-20T23:51:11Z by cipherpunks

  • Keywords packaging added
  • Priority changed from normal to major

comment:18 Changed at 2015-01-29T19:47:52Z by daira

  • Milestone changed from undecided to 1.12.0

comment:19 Changed at 2015-07-21T19:05:38Z by zooko

I think a good next-step on this is #2473 (stop using setup_requires).

comment:20 Changed at 2016-03-22T05:02:25Z by warner

  • Milestone changed from 1.12.0 to 1.13.0

Milestone renamed

comment:21 Changed at 2016-06-28T18:17:14Z by warner

  • Milestone changed from 1.13.0 to 1.14.0

renaming milestone

comment:22 Changed at 2020-06-30T14:45:13Z by exarkun

  • Milestone changed from 1.14.0 to 1.15.0

Moving open issues out of closed milestones.

comment:23 Changed at 2020-11-06T17:19:03Z by exarkun

I wonder what's left to do on this.

I naively believe that pip install --no-index --find-links path-to-wheelhouse/ will install Tahoe-LAFS and all its dependencies from path-to-wheelhouse or fail to install if there are missing dependencies - and not hit the network.

Just now I tried just this with my network disconnected and the installation completed successfully!

This is no proof that it will succeed tomorrow, of course. But maybe the desired behavior is provided now and what remains is to automatically verify it as part of continuous integration?

comment:24 Changed at 2021-03-30T18:40:19Z by meejah

  • Milestone changed from 1.15.0 to soon

Ticket retargeted after milestone closed

Note: See TracTickets for help on using tickets.