[tahoe-dev] [tahoe-lafs] #946: upload should succeed as soon as the servers-of-happiness criterion is met
tahoe-lafs
trac at allmydata.org
Thu Feb 11 23:14:22 PST 2010
#946: upload should succeed as soon as the servers-of-happiness criterion is met
--------------------------------+-------------------------------------------
Reporter: davidsarah | Owner: nobody
Type: enhancement | Status: new
Priority: major | Milestone: 1.7.0
Component: code-encoding | Version: 1.6.0
Keywords: performance upload | Launchpad_bug:
--------------------------------+-------------------------------------------
Changes (by warner):
* component: code-network => code-encoding
Comment:
hrm, I don't think this is going to improve upload "performance"
significantly, at least not if the grid is free of pathologically bad
servers. We upload one segment at a time, and don't proceed to the next
segment until we're completely done with the previous one (meaning all the
erasure-coded blocks have been delivered). To do otherwise would blow our
memory-footprint budget.
(scare quotes on "performance" because I think of upload-performance as
how long it takes to finish the job, and this proposal is basically
changing the definition of the job rather than making it occur more
efficiently. Plus there's some desireable tail of work that can occur
after servers-of-happiness is met, and I'd only call it a performance
improvement if you actually skip doing that extra tail of work).
So there isn't likely to be a big difference in outbound bandwidth between
all-shares-done and this (numshares*(NUMSEGS-1) blocks + servers-of-
happiness blocks) event.. just a few blocks, probably less than a megabyte
of data, independent of file size.
If the source file exists on a stable and cheap-to-access storage medium
(like perhaps on local disk, as opposed to a streaming upload), then we
could make multiple passes over it, trading off local disk IO and total
upload time against time-to-good-enough. The only way I'd really want to
use this would be if we lose one or more servers during the upload: it'd
be nice to immediately re-upload the leftover shares to new servers.
I think a useful take-home point from this ticket is that "upload is
complete" may be a fuzzily-defined event. For download it's a bit more
concrete (you either have the data to deliver or not), but for upload it's
fair to say that the "quality" of the upload (perhaps measured as the
number of servers who hold a share) remains at zero for quite a while,
then starts jumping upwards, then crosses some threshold of "good enough",
and eventually hits a plateau of "we're not going to bother making it any
better". If we have to pick a single point to fire a single return
Deferred, I'd stick with the plateau. But if we could usefully fire at
multiple points, I'd make "good enough" one of them.
I'd want to think carefully about the use case, though. How does the
uploader react to this event (whenever it occurs)? Do they start the next
upload? (then we must think about memory footprints caused by encouraging
multiple simultaneous uploads). Do they shut down their node because they
think it is idle? (then we might want to provide a flush() call). Do they
try to download the file, or hand someone else the filecap so *they* can
download it? (probably the best use case I can currently think of).
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/946#comment:3>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list