[tahoe-dev] [tahoe-lafs] #869: Allow Tahoe filesystem to be run over a different key-value-store / DHT implementation

Thu Jan 6 19:48:54 UTC 2011

#869: Allow Tahoe filesystem to be run over a different key-value-store / DHT
implementation
-----------------------------+----------------------------------------------
     Reporter:  davidsarah   |       Owner:  nobody                                                                                                         
         Type:  enhancement  |      Status:  new                                                                                                            
     Priority:  major        |   Milestone:  undecided                                                                                                      
    Component:  unknown      |     Version:  1.5.0                                                                                                          
   Resolution:               |    Keywords:  scalability performance  forward-compatibility backward-compatibility availability newcaps docs anti-censorship
Launchpad Bug:               |  
-----------------------------+----------------------------------------------

Comment (by warner):

 Replying to [comment:6 davidsarah]:
 >
 > As far as performance is concerned, signature ''verification'' is fast
 with
 > RSA, ECDSA or hash-based signatures (and the hashing can be done
 > incrementally as the share is received, so no significant increase in
 > latency). I don't think this is likely to be a performance bottleneck.

 I'd want to test this with the lowliest of our potential storage servers:
 embedded NAS devices like Pogo-Plugs and !OpenWRT boxes with USB drives
 attached (like Francois' super-slow ARM buildslave). Moving from Foolscap
 to
 HTTP would help these boxes (which find SSL challenging), and doing less
 work
 per share would help. Ideally, we'd be able to saturate the disk bandwidth
 without maxing out the CPU.

 Also, one of our selling points is that the storage server is low-impact:
 we
 want to encourage folks on desktops to share their disk space without
 worrying about their other applications running slowly. I agree that it
 might
 not be a big bottleneck, but let's just keep in mind that our target is
 lower
 than 100% CPU consumption.

 Incremental hashing will require forethought in the CHK share-layout and
 in
 the write protocol (the order in which we send out share bits): there are
 plenty of ways to screw it up. Mutable files are harder (you're updating
 an
 existing merkle tree, reading in modified segments, applying deltas,
 rehashing, testing, then committing to disk). The simplest approach would
 involve writing a whole new proposed share, doing integrity checks, then
 replacing the old one.

 > The compatibility impact of changes in the mutable share format would be
 > that an older server is not able to accept mutable shares of the newer
 > version from a newer client. The newer client can still store shares of
 the
 > older version on that server. Grids with a mixture of server and client
 > versions (and old shares) will still work, subject to that limitation.

 Hm, I think I'm assuming that a new share format really means a new
 encoding
 protocol, so everything about the share is different, and the filecaps
 necessarily change. It wouldn't be possible to produce both "old" and
 "new"
 shares for a single file. In that case, clients faced with older servers
 either have to reencode the file (and change the filecap, and find
 everywhere
 the old cap was used and replace it), or reduce diversity (you can only
 store
 shares on new servers).

 Migrating existing files to the new format can't be done in a simple
 rebalancing pass (in which you'd only see ciphertext); you'd need
 something
 closer to a {{{cp -r}}}.

 My big concern is that this would slow adoption of new formats like MDMF.
 Since servers should advertise the formats they can understand, I can
 imagine
 a control panel that shows me grid/server-status on a per-format basis:
 "if
 you upload an SDMF file, you can use servers A/B/C/D, but if you upload
 MDMF,
 you can only use servers B/C". Clients would need to watch the control
 panel
 and not update their config to start using e.g. MDMF until enough servers
 were capable to provide reasonable diversity: not exactly a flag day, but
 not
 a painless upgrade either.

 > On the other hand, suppose that the reason for the change is migration
 to a
 > new signing algorithm to fix a security flaw. In that case, a given
 client
 > can't expect any improvements in security until all servers have
 upgraded,

 Incidentally, the security vulnerability induced by such a flaw would be
 limited to availability (and possibly rollback), since that's all the
 server
 can threaten anyways. In this scenario, a non-writecap-holding attacker
 might
 be able to convince the server to modify a share in some invalid way,
 which
 will either result in a (detected) integrity failure or worst-case a
 rollback. Anyways, it probably wouldn't be a fire-drill.

-- 
Ticket URL: <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/869#comment:8>
tahoe-lafs <http://tahoe-lafs.org>
secure decentralized storage