#1449 closed defect (fixed)
drop-upload: updates to files may be lost if few servers are connected (e.g. soon after starting the gateway)
Reported by: | davidsarah | Owned by: | daira |
---|---|---|---|
Priority: | major | Milestone: | 1.12.0 |
Component: | code-frontend-magic-folder | Version: | 1.8.2 |
Keywords: | drop-upload preservation docs error otf-magic-folder-objective2 | Cc: | |
Launchpad Bug: |
Description (last modified by warner)
This is related to #719, but it may be a more significant problem for the drop-upload frontend because it starts monitoring the directory immediately.
In the latest #1429 patch, the 'Operational Statistics' page shows the number of drop-uploads that have failed, and there may be information about those failures in logs, but there is no other indication to the user that changed files have not been successfully uploaded.
Change History (24)
comment:1 Changed at 2011-07-28T03:43:47Z by davidsarah
- Description modified (diff)
- Keywords error added
- Summary changed from drop-upload: files may not be uploaded with sufficient diversity if few servers are connected (e.g. soon after starting the gateway) to drop-upload: updates to files may be lost if few servers are connected (e.g. soon after starting the gateway)
comment:2 Changed at 2014-12-02T19:47:23Z by warner
- Component changed from code-frontend to code-frontend-drop-upload
- Description modified (diff)
comment:3 Changed at 2015-03-13T13:09:48Z by daira
- Keywords otf-magic-folder added
comment:4 Changed at 2015-03-17T22:15:11Z by daira
- Owner changed from davidsarah to daira
- Status changed from new to assigned
comment:5 Changed at 2015-04-02T14:55:36Z by daira
- Keywords otf-magic-folder-objective2 added; otf-magic-folder removed
comment:6 Changed at 2015-04-10T03:47:10Z by dawuud
comment:7 Changed at 2015-04-10T17:51:57Z by daira
Results of David and my pairing session today: https://github.com/daira/tahoe-lafs/commits/1449.wait-for-enough-servers.1
comment:8 Changed at 2015-04-10T20:49:31Z by dawuud
my current state is now: commit a6708d07e7a54b0c44ae51602f40b4919ca834fe of branch https://github.com/david415/tahoe-lafs/tree/david-1449
the client unit tests now pass. i've removed the storage client test.
the drop upload test fails... hangs forever.
comment:9 Changed at 2015-04-10T23:02:59Z by dawuud
in more recent commits i got more unit tests to pass...
however, having trouble getting the drop upload test to pass, still.
comment:10 Changed at 2015-04-12T22:34:18Z by daira
- Milestone changed from soon to 1.11.0
comment:11 Changed at 2015-04-13T23:38:26Z by dawuud
ok i pushed my latest changes to here: https://github.com/david415/tahoe-lafs/tree/david-1449
i wasn't able to get the drop uploader unit tests passing so i just fixed naming convention usage like Daira mentioned earlier.
comment:12 Changed at 2015-04-14T06:20:09Z by dawuud
fix it with a deque! same branch. please review.
comment:13 Changed at 2015-04-14T16:59:38Z by daira
On #tahoe-lafs:
daira: dawuud: the current code in DropUploader._notify will (in the path not in self._pending branch) call _append_to_deque which adds the path to self._pending, then process the deque (synchronously), then add the path to self._pending again daira: the second self._pending.add(path) is wrong and should be deleted daira: processing the deque synchronously also may cause problems daira: it may be the change to synchronous processing that made the tests work, but I think we probably have to change it back to asynchronous daira: in particular, note that the deferred that is returned by _process is dropped by the call to func(*fields[1:]) in _process_deque daira: so this code will try to upload things in parallel... daira: which may work for immutable files, but is a bad idea for mutables, especially directories daira: I'll rebase the code as it is, anyway, so that we can review it more easily
comment:14 Changed at 2015-04-14T17:05:00Z by daira
- return self.uploader.startService() + self.uploader.setServiceParent(self.client) + self.uploader.startService() + self.uploader.upload_ready() + return None
self.uploader.startService() returns a deferred which is dropped here. (Maybe it should be synchronous, the Twisted API doc is not clear.)
comment:15 Changed at 2015-04-14T17:05:51Z by daira
I really want linear types for deferreds, so they can't be dropped implicitly!
comment:16 Changed at 2015-04-14T17:27:25Z by daira
comment:17 Changed at 2015-04-14T19:17:11Z by dawuud
new working code here -> https://github.com/david415/tahoe-lafs/tree/1449.dropupload-redundant-uploads.2
- i fixed the add-to-pending-bug
- perform uploads serially (let's optimize later!)
- push and pop the deque asynchronously (is that the correct term?) This design is highly influenced by Foolscap's eventually...
comment:18 Changed at 2015-04-17T20:11:18Z by dawuud
I just designed another uploader deque. Here: https://github.com/david415/tahoe-lafs/tree/2406.otf-objective-2.3.1-fix-upload-deque
I believe this to be a correct design that enforces sequential uploads and asynchronous deque appends... without the weird concurrent interleave bugs of my sloppy previous attempts.
comment:19 Changed at 2015-04-23T01:24:52Z by dawuud
I think this: https://github.com/david415/tahoe-lafs/blob/2406.otf-objective-2.3.1-fix-upload-deque/src/allmydata/client.py#L349-L353
is bad because it may cause unbalanced share allocation to storage servers. It seems likely that only connecting to K or H+1 servers would caused a single file's shares to be clustered on a smaller number of servers... meaning that some individual servers will get more than one share from that same file. This is bad... especially given that we do not yet have a "rebalancing" commandline tool of any kind.
comment:20 Changed at 2015-04-27T16:08:01Z by daira
I'm happy with the min(N, H+1) heuristic for now; we can reconsider this with Zooko and Brian's input before we merge to trunk.
comment:21 Changed at 2015-05-02T16:42:42Z by daira
- Resolution set to fixed
- Status changed from assigned to closed
comment:22 Changed at 2015-05-02T16:44:11Z by daira
Closing this and using ticket #2406 for any further review comments.
comment:23 Changed at 2016-03-22T05:02:52Z by warner
- Milestone changed from 1.11.0 to 1.12.0
Milestone renamed
comment:24 Changed at 2016-04-26T19:51:59Z by meejah <meejah@…>
In a56a3ad/trunk:
I've got a very rough draft untested solution to this ticket right here: https://github.com/david415/tahoe-lafs/tree/david-1449