#946 new enhancement

upload should succeed as soon as the servers-of-happiness criterion is met

Reported by: davidsarah Owned by: nobody
Priority: major Milestone: undecided
Component: code-encoding Version: 1.6.0
Keywords: performance upload availability servers-of-happiness Cc:
Launchpad Bug:

Description (last modified by davidsarah)

The performance of upload could be improved by returning success to the client as soon as the servers-of-happiness criterion is met, rather than when all shares have been placed. The storage client would continue to upload the remaining shares asynchronously.

This should have no negative impact on preservation or availability, because once the happiness criterion has been met, we know that we are going to return success to the client even if no further shares can be placed. It can improve upload availability in some cases, by allowing up to N-H servers to fail or hang during the upload.

Upload performance should be improved in a similar way to #928's improvement of download performance (although less dramatically, since with the default parameters, we will be returning after shares have been uploaded to the fastest 7 servers, rather than after shares have been downloaded from the fastest 3 servers as for #928).

Change History (8)

comment:1 follow-up: Changed at 2010-02-12T03:54:48Z by imhavoc

Consideration: Ad hoc grids should not be punished.

My grid is at the "inception stage." We have 4 live storage servers. I would still set my "servers of happiness" to 5 or 7. This should not break uploads.

comment:2 in reply to: ↑ 1 Changed at 2010-02-12T05:55:41Z by davidsarah

  • Component changed from unknown to code-network

Replying to imhavoc:

Consideration: Ad hoc grids should not be punished.

My grid is at the "inception stage." We have 4 live storage servers. I would still set my "servers of happiness" to 5 or 7. This should not break uploads.

That will break uploads, by definition of what "servers of happiness" means. In this case it would mean that 5 or 7 servers need to have distinct shares, so a grid with 4 servers can't meet the happiness criterion for any upload. You would need to set "servers of happiness" to at most 4, until you have more servers. (This does not prevent the upload code from trying to upload N shares, and it still wouldn't prevent that if this ticket were implemented.)

In any case, the relevant ticket for that behaviour is #778, not this ticket.

comment:3 follow-up: Changed at 2010-02-12T07:14:22Z by warner

  • Component changed from code-network to code-encoding

hrm, I don't think this is going to improve upload "performance" significantly, at least not if the grid is free of pathologically bad servers. We upload one segment at a time, and don't proceed to the next segment until we're completely done with the previous one (meaning all the erasure-coded blocks have been delivered). To do otherwise would blow our memory-footprint budget.

(scare quotes on "performance" because I think of upload-performance as how long it takes to finish the job, and this proposal is basically changing the definition of the job rather than making it occur more efficiently. Plus there's some desireable tail of work that can occur after servers-of-happiness is met, and I'd only call it a performance improvement if you actually skip doing that extra tail of work).

So there isn't likely to be a big difference in outbound bandwidth between all-shares-done and this (numshares*(NUMSEGS-1) blocks + servers-of-happiness blocks) event.. just a few blocks, probably less than a megabyte of data, independent of file size.

If the source file exists on a stable and cheap-to-access storage medium (like perhaps on local disk, as opposed to a streaming upload), then we could make multiple passes over it, trading off local disk IO and total upload time against time-to-good-enough. The only way I'd really want to use this would be if we lose one or more servers during the upload: it'd be nice to immediately re-upload the leftover shares to new servers.

I think a useful take-home point from this ticket is that "upload is complete" may be a fuzzily-defined event. For download it's a bit more concrete (you either have the data to deliver or not), but for upload it's fair to say that the "quality" of the upload (perhaps measured as the number of servers who hold a share) remains at zero for quite a while, then starts jumping upwards, then crosses some threshold of "good enough", and eventually hits a plateau of "we're not going to bother making it any better". If we have to pick a single point to fire a single return Deferred, I'd stick with the plateau. But if we could usefully fire at multiple points, I'd make "good enough" one of them.

I'd want to think carefully about the use case, though. How does the uploader react to this event (whenever it occurs)? Do they start the next upload? (then we must think about memory footprints caused by encouraging multiple simultaneous uploads). Do they shut down their node because they think it is idle? (then we might want to provide a flush() call). Do they try to download the file, or hand someone else the filecap so *they* can download it? (probably the best use case I can currently think of).

comment:4 in reply to: ↑ 3 Changed at 2010-02-12T14:57:05Z by zooko

Replying to warner:

I think a useful take-home point from this ticket is that "upload is complete" may be a fuzzily-defined event. For download it's a bit more concrete (you either have the data to deliver or not), but for upload it's fair to say that the "quality" of the upload (perhaps measured as the number of servers who hold a share) remains at zero for quite a while, then starts jumping upwards, then crosses some threshold of "good enough", and eventually hits a plateau of "we're not going to bother making it any better".

If and when we finish #778 then servers-of-happiness will be the measure of this instead of "number of servers who hold a share".

comment:5 Changed at 2010-04-12T19:31:23Z by davidsarah

  • Milestone changed from 1.7.0 to undecided

I still think this is a good idea, but only if we find a way to do it without impact on memory usage (when we are memory-limited), and that probably isn't going to happen for 1.7.

comment:6 Changed at 2010-04-25T15:42:45Z by davidsarah

  • Description modified (diff)
  • Keywords availability added

comment:7 Changed at 2010-05-17T04:16:06Z by zooko

ticket:614#comment:10 talks about an interaction between this ticket and repairer that could be bad and should be kept in mind if you are working on this ticket.

comment:8 Changed at 2010-12-29T09:12:50Z by zooko

  • Keywords servers-of-happiness added
Note: See TracTickets for help on using tickets.