#2357 closed task (fixed)

document in what ways Tahoe-LAFS builds are not currently verifiable

Reported by: daira Owned by: daira
Priority: normal Milestone: soon (release n/a)
Component: documentation Version: 1.10.0
Keywords: openitp-packaging build docs Cc:
Launchpad Bug:

Description (last modified by daira)

A long-term goal, ticketed as #2057, is to enable end-users to *verify* that the package of Tahoe-LAFS that they are using was generated from the exact same source code that a security auditor examined.

In order to explain the verifiable build concept, consider this simple diagram:

    distributor: source code ➾ binary package → user

Here we use “➾” to mean “build” — the process that produces usable packages out of source code.

Now consider a security auditor who does a source-code-based examination (as opposed to binary-based, which is called “reverse engineering”). This security auditor will start with the source code, and examine it for vulnerabilities or backdoors.

    auditor: source code → security audit

How can the user who receives a binary package know whether that package was built from the source that the auditor examined?

The “verifiable build” approach attempts to answer that question by having the security auditor perform the “source code ➾ binary package” on their own trusted system, and then taking a fingerprint (secure hash) of the resulting binary package:

   auditor: source code ➾ binary package
   auditor: binary package → generate fingerprint

The auditor then publishes that fingerprint along with their report about their security audit. Users who receive the binary package can take a fingerprint of that package and compare it to the fingerprint in the published report.

   distributor: source code ➾ binary package → user
   user: binary package → check fingerprint

This approach can work only if the ➾ operation performed by the distributor results in a bytewise-identical binary as the ➾ operation performed by the security auditor.

Here is a news article from LWN.net about the concept of verifiable builds (prompted in part by an open letter that we wrote): “Security software verifiability”. Here is a post on the tahoe-dev mailing list about our desire to have verifiable builds for Tahoe-LAFS.

The goal of this ticket is to have documentation of the ways in which Tahoe-LAFS builds are not currently verifiable. Its scope includes:

  • Tahoe-LAFS as built via setup.py (using setuptools and/or pip), and
  • the MAC OS X (#182) and Windows (#195) packages

but does not include Tahoe-LAFS as packaged by an operating system distribution or package management system.

It may be useful to consider how existing projects have approached this problem: Debian, Tor, Bitcoin, and the recent ad-hoc reproduction of the TrueCrypt Windows binaries.

Change History (4)

comment:1 Changed at 2014-12-29T16:17:29Z by daira

  • Milestone changed from undecided to soon
  • Owner changed from marlowe to daira
  • Status changed from new to assigned

comment:2 Changed at 2014-12-29T16:20:07Z by daira

  • Description modified (diff)

comment:3 Changed at 2015-01-09T02:21:32Z by daira

OpenITP meeting 5 January 2014

note: nondeterminism that results in obvious build failures is ok different build targets can have different fingerprints what counts as a build target?

[NONDET: operating system versions, patches, variants, distribution if counted as the same target]

quickstart build flow: install Python if necessary download the allmydata-tahoe-*.zip file (for a given build target) unzip it

[NONDET: unzip programs might vary in e.g. permissions of unzipped files] [NONDET: file timestamps may depend on the clock of the build system] [NONDET: order of files/subdirs in directories, if filesystem does not sort them]

run setup.py build in a command prompt

[NONDET: which Python version runs setup.py?] [NONDET: other installed Python versions might affect the build?] [NONDET: which setuptools/pkg_resources/virtualenv version?] [NONDET: system or virtualenv?] [NONDET: which other Python packages installed on system and in virtualenv?] [NONDET: PYTHONPATH]

it has some set of URLs where it looks for package distributions ("dists")

[NONDET: using the net at all is hopeless wrt determinism]

which dists it chooses can influence further choices of dist for other dependencies try to build each dist

[NONDET: order of builds? not sure what algorithm is used]

dists are either pure Python or have C/C++ code

[NONDET: buildchain for C/C++ code (includes many non-obvious dependencies)] [NONDET: build process for C/C++ code] [NONDET: distutils properties that affect compilation] [NONDET: environment vars that affect compilation] [NONDET: execution of Python code for building a dist (e.g dict order etc.)] [NONDET: do any dependencies rely on entropy sources (e.g. os.urandom)?] [NONDET: can operations like running tests affect the built copy of Tahoe?]

sources of nondeterminism from builds of dependencies

Last edited at 2015-01-09T02:23:30Z by daira (previous) (diff)

comment:4 Changed at 2015-02-03T16:21:45Z by daira

  • Milestone changed from soon to soon (release n/a)
  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.