#218 closed enhancement (fixed)

resumption of incomplete transfers

Reported by: zooko Owned by: warner
Priority: major Milestone: 0.8.0 (Allmydata 3.0 Beta)
Component: code-network Version: 0.7.0
Keywords: upload download partial Cc:
Launchpad Bug:

Description

Peter mentioned to me that an important operational issue is resumption of large file transfers that are interrupted by network flapping.

To do this, we change storage servers so that they no longer delete the "incoming" data that was incompletely uploaded, on detection of a connection break. Then we extend the upload protocol so that uploaders learn about which blocks of a share are already present on the server and they don't re-upload those blocks.

Likewise on download.

Change History (6)

comment:1 Changed at 2007-12-31T21:25:58Z by warner

That's an important user-facing feature. There are a couple of different places where it might be implemented, some more appropriate that others. What matters to the user is that their short-lived network link be usable to upload or download large files.. they don't really care how exactly this takes place.

The three places where I can see this happening (on upload) are:

  1. RIStorageServer : one client node allocates a bucket and starts writing

shares to it, then stops. At some point later, another node (possibly the same one) does the same thing. It might be nice to have the second node learn about the partial share and avoid re-uploading that data.

  1. HTTP PUT : this is hard, since our very RESTful protocol just accepts

data and then returns a URI. We could say that PUTs to a child name (PUT /uri/$DIRURI/foo.jpg) respond to early termination by uploading the partial data anyways (and adding the resulting URI to the directory), then a later PUT with some Content-Range header that signals we want to modify (append to) the existing data means the client node should download that data, append the new data to it, then re-upload the whole thing, then finally replace the partial child URI with the whole one. Ick.

  1. something above HTTP PUT : perhaps an operation to allocate a handle of

some sort, then do PUTs to that handle, then close it, similar to the xmlrpc-based webfront API we use on MV right now.

For download, things are a bit easier, since we can basically do random-access reads from CHK files, and the HTTP GET syntax can pass a Content-Range header that tells us which part of the file they want to read. We just have to implement support for that.

I'm probably leaning towards the third option (something above PUT), but it depends a lot upon what sort of deployment options we're looking at and which clients are stuck behind the flapping network link.

comment:2 Changed at 2008-01-05T03:53:11Z by warner

  • Milestone changed from undecided to 0.9.0

comment:3 Changed at 2008-01-08T23:26:30Z by warner

I believe (correct me if I'm wrong) the current thinking is that this feature will be provided through the Offloaded Uploader (#116), operating in a spool-to-disk-before-encode mode.

The idea is that the client (who has a full copy of the file and has done one read pass to compute the encryption key and storage index) sends the SI to the helper, which checks the appropriate storage servers and either says "it's there, don't send me anything", "it isn't there, send me all your crypttext", or "some of it is here on my local disk, send me the rest of the crypttext". In the latter case, the helper requests the byte-range that it still needs, repeating as necessary until it has the whole (encrypted) file on the helper's disk. Then the helper encodes and pushes the shares. We assume that the helper is running in a well-managed environment and neither gets shut down frequently nor does it lose network connectivity to the storage servers frequently. The helper is also much closer to the storage servers, network-wise, so it is ok if an upload must be restarted as long as the file doesn't have to be transferred over the home user's (slow) DSL line multiple times.

This provides for the resume-interrupted-upload behavior for home users that are running their own node (when using the Offloaded Uploader helper). This does not help users who are running a plain web browser (and thus uploading files with HTTP POSTs to an external web server).. to help a web browser, we'd need an Active-X application or perhaps Flash or something. It also doesn't help friendnet installations that do not have a helper node running closer to the storage servers than the client. This seems like an acceptable tradeoff.

comment:4 Changed at 2008-01-09T01:06:11Z by warner

  • Milestone changed from 0.9.0 (Allmydata 3.0 final) to 0.8.0 (Allmydata 3.0 Beta)

as I read the milestones, this belongs in 0.8.0

comment:5 Changed at 2008-01-24T00:21:37Z by warner

  • Owner set to warner
  • Status changed from new to assigned

comment:6 Changed at 2008-01-28T19:06:04Z by warner

  • Resolution set to fixed
  • Status changed from assigned to closed

Ok, this is now complete in the CHK upload helper. Clients which use the helper will send their ciphertext to the helper, where it gets stored in a holding directory (BASEDIR/helper/CHK_incoming/) until it is complete. If the client is lost, the partial data is retained for later resumption. When the incoming data is complete, it is moved to a different directorh (CHK_encoding/) and then the encode+push process begins.

The #116 helper is not complete (it still does not have support for avoiding uploads of file which are already present in the grid), but this portion of it is, so I'm closing out this ticket.

I think we still need some sort of answer for incomplete downloads, so I'm opening a new ticket for the download side (#288).

Note: See TracTickets for help on using tickets.