#1340 new enhancement

consider share-at-a-time uploader

Reported by: warner Owned by:
Priority: normal Milestone: undecided
Component: code-encoding Version: 1.8.1
Keywords: upload usability performance memory availability preservation Cc:
Launchpad Bug:

Description

As discussed in this thread:

http://tahoe-lafs.org/pipermail/tahoe-dev/2011-January/005963.html

(in response to discussion about UIs for #467 static-server-selection), Chris Palmer described Octavia's is-it-uploaded-yet UI. I liked it, and I'd like to explore how we might achieve something similar in Tahoe. This ticket is to make sure we don't forget the idea.

The basic goal is to give each upload a sort of "red/yellow/green-light" status indicator, showing how healthy/robust the file is. When the upload first starts, it stays in the "red" state until there are enough shares pushed to recover the file if it were to be deleted from the original computer, then transitions into the "yellow" state. In that state, more shares are uploaded until we've achieved the desired diversity, at which point it goes "green".

The key differences from our current uploader:

  • we set N higher than usual, maybe 25 or even 50
  • we don't upload all N shares. Instead, we define some lower threshold "Nup" (closer to our old N value) which indicates how many shares we want to upload. The idea is that we could create more shares if we wanted to, without changing the encoding parameters or the filecaps.
  • we store the whole file locally for longer, and do multiple passes over it, instead of the almost-streaming approach we have now
    • the first pass generates all shares, builds all the hash trees, but only uploads 'k' shares (discarding the rest)
    • subsequent passes re-use the cached hash trees, generate all shares again, but only upload some other subset of the shares, maybe 'k' at a time, or maybe more.
    • retain the whole file and the cached hash trees until we've uploaded Nup shares
  • if the network is partitioned before Nup is achieved, or the client is shut down, resume generating and uploading shares when it comes back
  • have a realtime status icon which shows the aggregate red/yellow/green state of all current uploads. This lets users know when it's safe to close their laptops.

The general notion is to sacrifice some efficiency to reduce the time needed to get to a mostly-safe upload. The original downloader generates and uploads all shares in parallel, which eliminates wasted CPU cycles and almost doesn't need the disk (and #320 plus some new CHK format could make it fully streaming). But if the process is interrupted before the very last close() is sent to each share, the whole upload fails and must be started again from scratch. This new uploader would need more local disk storage (to handle multiple passes), and waste some amount of CPU (encoding shares that were then discarded, unless they too were stored on local disk, and we learned from Tahoe's predecessor that disks don't do matrix transpositions well), but would get the file to at least a recoverable state in about the same time a normal non-encoded FTP upload would have finished, and then gets better after that.

(caching the hashtrees would save some of the hashing time on the second pass, which may or may not be a win, since hashing is generally pretty quick too)

Some assumptions that must be tested before this scheme is at all realistic:

  • generating extra shares (and then discarding them without pushing) is cheap
    • generating N=25 or N=50 shares (and never uploading them, just keeping the encoding space in reserve for the future) is cheap
  • building the hash trees on each pass, or caching them, is cheap
  • we can tolerate the extra disk IO (multiple passes over each file)

At present, zfec doesn't have an API to create fewer than N shares: you have to make all of them and then throw some away. It might be possible to enhance zfec to allow this (and of course do it faster than create-and-discard: ideally, the CPU time would be proportional to the number of shares we retain), but I haven't looked at the Reed-Solomon encoding scheme enough to tell. If so, then the second and subsequent passes could avoid the encoding waste (we'd still generate each share twice: once on the first pass to build the hash trees, and a second time on some subsequent pass when we actually push that share).

Octavia uses pure replication (k=1), which removes the question of encoding overhead, so it's a bit easier to use this scheme in Octavia than in Tahoe.

The big big win of this approach is the UI. The k-of-N encoding parameters don't matter quite so much when you can keep creating more shares until you're happy with their distribution, so there might be less need for users to choose k/N instead of sticking with the defaults. And the "red/yellow/green" status light is a dead-simple UI indicator, like the way Dropbox tells you when it's done moving data around.

The next step is probably to do some zfec performance tests, to find out what setting N=25 (and then discarding the extra shares) would do to our upload speed.

Change History (3)

comment:1 Changed at 2011-01-27T19:57:53Z by zooko

I like it! See related tickets #678 (converge same file, same K, different M) and #711 (repair to different levels of M).

comment:2 Changed at 2011-05-21T00:12:55Z by davidsarah

  • Keywords upload usability performance memory availability preservation added

comment:3 Changed at 2012-11-02T01:03:20Z by davidsarah

  • Priority changed from major to normal
Note: See TracTickets for help on using tickets.