Tahoe-LAFS Weekly News, issue number 72, February 20 2017

Welcome to the Tahoe-LAFS Weekly News (TWN). Tahoe-LAFS is a secure, distributed storage system. View TWN on the web or subscribe to TWN. If you would like to view the "new and improved" TWN, complete with pictures; please take a look.

ANNOUNCING Tahoe, the Least-Authority File Store, v1.12.1

On behalf of the entire team, I'm pleased to announce the 1.12.1 release of Tahoe-LAFS.

Tahoe-LAFS is a reliable encrypted decentralized storage system, with "provider independent security", meaning that not even the operators of your storage servers can read or alter your data without your consent. See http://Tahoe-LAFS.readthedocs.org/en/latest/about.html for a one-page explanation of its unique security and fault-tolerance properties.

With Tahoe-LAFS, you distribute your data across multiple servers. Even if some of the servers fail or are taken over by an attacker, the entire file store continues to function correctly, preserving your privacy and security. You can easily share specific files and directories with other people.

The 1.12.1 code is available from the usual places:

All tarballs, and the Git release tag, are signed by the Tahoe-LAFS Release Signing Key (fingerprint E34E 62D0 6D0E 69CF CA41 79FF BDE0 D31D 6866 6A7A), available for download from https://Tahoe-LAFS.org/downloads/tahoe-release-signing-gpg-key.asc

Full installation instructions are available at:

http://Tahoe-LAFS.readthedocs.io/en/Tahoe-LAFS-1.12.1/INSTALL.html

1.12.1 fixes a few small things from the 1.12.0 release: the multiple-introducers ("introducers.yaml") feature was completely broken, creating nodes with --hide-ip on I2P-only systems should not set "tcp = tor", and at least one --listen=I2P problem was fixed. Please see the NEWS file for details:

https://github.com/Tahoe-LAFS/Tahoe-LAFS/blob/ce47f6aaee952bfc9872458355533af6afefa481/NEWS.rst

Many thanks to Least Authority Enterprises for sponsoring developer time and contributing of the new Magic Folders feature.

This is the seventeenth release of Tahoe-LAFS to be created solely as a labor of love by volunteers. Thank you very much to the team of "hackers in the public interest" who make Tahoe-LAFS possible. Contributors are always welcome to join us at https://Tahoe-LAFS.org/ and https://github.com/Tahoe-LAFS/Tahoe-LAFS .

Brian Warner on behalf of the Tahoe-LAFS team

January 18, 2017 San Francisco, California, USA

Mailing List

Devchat

Tuesday, 10 January 2017

Attendees: warner, meejah, liz, jp, str4d, daira, cypher, dawuud

  • Debian freeze is happening soon, trying to fix critical bugs and make a new release in the next week or so
  • I2P bugs found at CCC
    • #2861: negotiation failure when I2P client connects to I2P server
      • str4d and warner to pair this time tomorrow
    • #2858: I2P provider/handler
      • PR33 on foolscap
    • #2859 can be closed
  • Tor blog post has a comment about #2862 (introducers.yaml docs syntax failure
  • extras_require on win32 (#2763) may require newer pip, check Debian to see if it's in place
    • potential problem is that "pip install Tahoe-LAFS" on Debian won't work
    • but Debian packaging of Tahoe-LAFS would probably be ok
    • pip docs say platform= is supported since pip-6
  • what version of twisted will go into the Debian freeze (#2857)
  • cloud-backend: warner should look at 2237.cloud-backend-merge.0
    • that is cloud-backend, with master merged in, with some additional meejah commits on top
    • still a few unit tests that fail
    • daira/meejah will do some rebase/rewriting work, merge master into it again
    • warner will treat that branch as a resource to mine diffs from, will review some diffs and apply to master
    • then new master will be merged into the branch again
    • over time the cloud-backend branch diff will shrink
    • big changes/refactoring
      • lots of APIs went from sync to async
      • so tests got harder
    • exarkun cautions against Mock and inlineCallbacks
      • see txaws tests: txaws.supplied.s3fake , returns pre-resolved Deferreds
      • use TestCase.successResultOf/failureResultOf instead of inlineCallbacks
      • mock leads you to tests that know too much about the internals of the implementation, and are fragile
      • instead, write second simple implementation of your real interface, which only operates on local memory
        • e.g. named "MemoryXYZ" instead of "XYZ"
      • write tests against that interface
      • tests can run against either implementation
      • one test runs against both, and uses external dependencies, etc, with async and inlineCallbacks
        • exercises XYZ specifically
      • other tests only use the simple implementation, and are synchronous
        • merely uses XYZ
  • what people are interested in
    • meejah: servers of happiness
    • dawuud: generic accounting plugin api
  • should tahoe be all "plumbing"?
    • command plugin mechanism like git/hg/twisted.plugins
    • should we add more stuff to tahoe itself, or to apps on top?
    • tahoe as library
    • application targets
      • things that need a key-value store
      • can we shape tahoe into a way that is suitable for those applications?
      • could we plug into Slack? LibreOffice "save to my tahoe grid" option?
      • Spideroak/Semaphor storage backend?
      • Signal? Wire? file transfer backend
  • when should the next summit be?
    • maybe before/after PyCon in May? Portland
    • tor-dev in Montreal in September?
    • IFF in Spain in March?
    • After CCC before RWC in Jan 2018
  • revocable immutable files? meejah
    • ask server to re-encrypt ciphertext with a stream cipher, append new key to an encrypted list, issue new readcaps

Tuesday, 24 January 2017

Attendees: ramki, warner, meejah, exarkun, dawuud, cypher

  • 1.12.1 landed in Debian unstable, will migrate to testing 29-jan (with Foolscap)
  • #1382 (servers-of-happiness): dawuud and meejah have been refactoring the branch, improving the tests
    • made small change to spec, server with no space is treated as read-only
    • the actual spec-change is in 17c562129d464e000a3a7e0d14b4d751bf3be0e6
    • In step 6, "Let an edge exist between server S and share T if and only if S already has T, or could hold T (i.e. S has enough available space to hold a share of at least T's size)." -> becomes "Construct a bipartite graph G3 of (only readwrite) servers to shares (some shares may already exist on a server)."
    • there's some code duplication that could be refactored, maybe in the future, in util/happinessutil.py
  • #2861 (I2P vs TLS) is still broken, warner needs some time with str4d and wireshark to debug
  • JSON welcome page (#2476): dawuud is getting ready to land
    • provides several pieces: introducer info, other-servers info, my-storage-server status info, versions
    • should we make separate child URL paths for the individual pieces?
    • naw, just one big GET /?t=json
    • future new-WAPI can provide something more civilized
  • cloud-backend: why does it need accounting?
    • warner: backgrounder on starter-leases, transition to leasedb-capable version, bootstrap after loss of leasedb
    • exarkun: could we move the leases-database out to the cloud backend, remove local mutable state from server
    • Amazon RDS?
  • direct-to-S3 mode: maybe give up accounting/GC in that case
    • would still be useful for some personal use cases
    • super hard to do strong accounting (with adversarial clients) without a real server
    • hacky IAM roles? eww
  • leases in S3 as files with one-line-per-SI? (plus accounting identifier, expiration time)
    • occasional fetch, populate sqlite, run query, forget sqlite
    • sometimes delete the S3 files when they've expired
    • server writes these files on behalf of identified clients
  • or leases as files in tahoe itself?
    • one account per directory, so only one writer, so no conflicts
    • written by storage server, not by clients
    • S3 holds (tahoe) SERVERNAME/leases/CLIENTID/FILES
  • spectrum of options
    • 1: store lease info in non-durable efficiently-queryable location (not on backend): sqlite
    • 2: keep two copies, try to keep in sync
      • maybe just write whole .sqlite file into S3 after each change
    • 3: fetch and build ephemeral DB when you want to make queries, then throw it away
      • canonical lease table is stored as loose files in S3, occasionally pruned
    • 4: store info in clever loose backend way, but queries will probably be expensive
    • would look a lot like the old local-crawler approach, but with files in S3, async reads
  • meejah's cloud-backend branch is still the right one to mine patches from (2237.cloud-backend-merge.0)
  • make a new 'lafs' CLI command? with cleaner subcommand tree?
    • leave 'tahoe' as plumbing, use 'lafs' as porcelain?
    • plugins?
  • adding an attenuate/diminish CLI command (to get from writecap to readcap)?

Tuesday, 31 January 2017

Attendees: warner, str4d, meejah, daira

We spent the whole time investigating #2861 (an SSL handshake failure when using I2P on 1.12).

The root cause was found to be txI2P's unusual approach to server connections, coupled with Twisted's TLS handling.

Most protocols (TCP, Tor) receive inbound connections by listening on a TCP socket, and then accepting connections (either from the real client, for TCP, or from the local Tor daemon). I2P is an exception, because the Tahoe server makes an outbound connection to the I2P daemon, then asks the daemon to use that TCP link for inbound I2P connections.

Twisted uses the type of the underlying connection (outbound client, or inbound server) to decide which kind of TLS handshake it should emit: a ClientHello, or a ServerHello. TLS requires exactly one side to send a ClientHello, after which the other side sends the matching ServerHello. When both sides are using client-like connections, both sides send a ClientHello, and the TLS negotiation fails.

We're trying to figure out the cleanest way to fix this. It might be to patch Twisted to add a new argument to the startTLS() call (probably "side=", so you could explicitly request either client or server, and ignore the underlying connection type). We'd then make a corresponding change to Foolscap, wait for the next Twisted release, and bump the dependencies.

Or it might be easier to change Foolscap's TLS handling, to switch to TLS in a different way, that would give us more control over the handshake side it uses (in short, switch from startTLS to direct use of TLSMemoryBIOProtocol). That wouldn't require any changes to Twisted, just a new version of Foolscap, but would probably be more work.

Feel free to follow along on https://Tahoe-LAFS.org/trac/Tahoe-LAFS/ticket/2861 for the details.

Other work that's ongoing:

  • ramki got tahoe 1.12.1 into Debian (sid) in plenty of time to make the Stretch freeze. 1.12.1 is now in "testing", and everything in "testing" will be frozen for the Stretch release on 05-Feb. If you use Debian (sid or testing), please "apt install Tahoe-LAFS" and make sure everything works as expected. We know of two problems right now: I2P doesn't work (see above, but it doesn't matter quite so much because I2P isn't packaged in Debian yet), and "tahoe --version" emits a scary-looking but benign warning about dependency versions.
  • meejah and dawuud have been working hard at bringing #1382 (servers-of-happiness, server-selection cleanups) up to date, so hopefully we can land it soon
  • meejah has also been working on #2237 (cloud-backend), and I think the next step will be to incrementally land changes from that branch on trunk, then merging from master back into the branch until it shinks away into nothing. Basically mining the branch for patches in an order that makes review and merging easier to manage.
  • there are a couple of other PRs on github that should be landable without too much work

Tuesday, 07 February 2017

Attendees: warner, liz, dawuud, meejah, exarkun

  • dawuud and meejah are rewriting the #1382 severs-of-happiness branch
    • markberger did a clever set-rearrangement thing, makes it run much faster than the algorithm we wrote up at the summit
    • they're rewriting his code as functions, bringing it up to date with our coding standards
  • accounting for S4-like services (shares on S3, tahoe server on EC2)
    • need to reconstruct lease data without EC2 state
    • could store lease data in small files next to shares, keep sqlite cache on EC2 box, rebuild when necessary
    • make it part of the pluggable storage backend
    • should it be an external command? or a built-in do it automatically at startup if the DB is missing?
      • it will take a while: must fetch all lease records from S3
    • not just leases: also accounts and account attributes
    • just dump whole .sqlite file into S3?
    • backend should be responsible for this: could choose to use a cloud DB service
      • maybe add an exception type for backends to raise during setup that means "please tell the operator to run a recovery command"
      • for upgrades and recovery
      • backend also has the option to do recovery automatically
    • Accountant is shared, but its state is stored in a backend-specific way
    • S3 has "immediate consistency" for reading new objects that were not read before being created
      • and eventual consistency for everything else
      • so try to avoid modifying shares
  • goal is to allow a copy of .tahoe to serve as a backup
    • node can modify the contents for a few seconds after startup, but should then stop
    • should not require continuous backup of .tahoe
    • exarkun points out that it'd be better to be able to have a "tahoe init" command
      • All the state that is part of the node identity is created by this step
      • If you backup .tahoe after running this, you can always reconstitute the same node from that backup
  • "node state storage subsystem"
    • not just shares
    • accounting info, runtime-discovered config data
  • exarkun thinks about storing this in a DB for analytics
  • could we use tahoe to store its own config/state?
    • worried about performance
    • would introduce extra dependencies: server A would depend upon server B for its own state
  • maybe just use storage-backend for it
    • SI=accounting-thing-1
    • account manager asks share storage backend to write data to a known-SI
    • needs to be encrypted
      • general principle: protect server against its own storage backend
      • part of .tahoe/private/ is a key that encrypts that data
      • refactor file encoding/decoding code to be able to use it locally
        • "please encrypt this (state thing), one share only"
        • then turns around to write the ciphertext into the storage backend
    • goal is for .tahoe/private to be snapshotted once, right after startup, and that should be sufficient as a backup
  • storing things in different ways depending upon how fast they happen
    • "low rate": node init
    • "medium rate": accounting changes: Alice is given permission to write, etc
    • "high rate": shares being modified
    • we're willing to make the operator do a backup of .tahoe/ for low-rate changes
    • willing to make S3 writes of databases/etc for medium-rate, but not high-rate
    • must be willing to make S3 writes (of flat data) for high-rate (share changes)
  • maybe deployment makes a decision
    • speed of local sqlite, but not persistent
    • security of local sqlite: not exposed to other cloud users
    • persistence (but low-performance) of S3-stashed .sqlite files
    • persistence (but low-security) of real AWS cloud -DB
  • starting point: add API to storage server for "local data" (config/private/etc)
    • must be async
    • code in Client or Node that uses self.write_private_config could be changed to use Client.write_something, which delegates to new StorageServer API
  • related to replacing tahoe.cfg with tahoe.sqlite
    • must write "tahoe config" CLI command
    • lose ability to edit config with text editor
    • most users don't want to use a text editor
    • you can instruct someone to copy+paste a CLI command, but not instructions for a text editor
  • talking about Petmail, Vuvuzela, rerandomizable tokens
  • I2P problem
    • both sides were being TLS clients. TLS requires one client and one server.
    • could change Twisted's .startTLS() api to let you specify the side
    • or could change Foolscap to wrap the underlying protocol itself
      • this would enable Foolscap-over-X, where X is an ITransport but not real TCP
      • maybe some new protocol that's implemented in pure Twisted, rather than TCP to a local daemon
  • http://www.lothar.com/blog/55-Git-over-Tahoe-LAFS/

The Tahoe-LAFS Weekly News is published once a week by The Tahoe-LAFS Software Foundation, President and Treasurer: Peter Secor peter. Scribes: Patrick "marlowe" McDonald marlowe, Zooko Wilcox-O'Hearn , Editor Emeritus: zooko.

Send your news stories to marlowe@antagonism.org - submission deadline: Monday night.