[tahoe-dev] [tahoe-lafs] #946: upload should succeed as soon as the servers-of-happiness criterion is met

Thu Feb 11 23:14:22 PST 2010

#946: upload should succeed as soon as the servers-of-happiness criterion is met
--------------------------------+-------------------------------------------
 Reporter:  davidsarah          |           Owner:  nobody
     Type:  enhancement         |          Status:  new   
 Priority:  major               |       Milestone:  1.7.0 
Component:  code-encoding       |         Version:  1.6.0 
 Keywords:  performance upload  |   Launchpad_bug:        
--------------------------------+-------------------------------------------
Changes (by warner):

  * component:  code-network => code-encoding

Comment:

 hrm, I don't think this is going to improve upload "performance"
 significantly, at least not if the grid is free of pathologically bad
 servers. We upload one segment at a time, and don't proceed to the next
 segment until we're completely done with the previous one (meaning all the
 erasure-coded blocks have been delivered). To do otherwise would blow our
 memory-footprint budget.

 (scare quotes on "performance" because I think of upload-performance as
 how long it takes to finish the job, and this proposal is basically
 changing the definition of the job rather than making it occur more
 efficiently. Plus there's some desireable tail of work that can occur
 after servers-of-happiness is met, and I'd only call it a performance
 improvement if you actually skip doing that extra tail of work).

 So there isn't likely to be a big difference in outbound bandwidth between
 all-shares-done and this (numshares*(NUMSEGS-1) blocks + servers-of-
 happiness blocks) event.. just a few blocks, probably less than a megabyte
 of data, independent of file size.

 If the source file exists on a stable and cheap-to-access storage medium
 (like perhaps on local disk, as opposed to a streaming upload), then we
 could make multiple passes over it, trading off local disk IO and total
 upload time against time-to-good-enough. The only way I'd really want to
 use this would be if we lose one or more servers during the upload: it'd
 be nice to immediately re-upload the leftover shares to new servers.

 I think a useful take-home point from this ticket is that "upload is
 complete" may be a fuzzily-defined event. For download it's a bit more
 concrete (you either have the data to deliver or not), but for upload it's
 fair to say that the "quality" of the upload (perhaps measured as the
 number of servers who hold a share) remains at zero for quite a while,
 then starts jumping upwards, then crosses some threshold of "good enough",
 and eventually hits a plateau of "we're not going to bother making it any
 better". If we have to pick a single point to fire a single return
 Deferred, I'd stick with the plateau. But if we could usefully fire at
 multiple points, I'd make "good enough" one of them.

 I'd want to think carefully about the use case, though. How does the
 uploader react to this event (whenever it occurs)? Do they start the next
 upload? (then we must think about memory footprints caused by encouraging
 multiple simultaneous uploads). Do they shut down their node because they
 think it is idle? (then we might want to provide a flush() call). Do they
 try to download the file, or hand someone else the filecap so *they* can
 download it? (probably the best use case I can currently think of).

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/946#comment:3>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid