#1449 closed defect (fixed)

drop-upload: updates to files may be lost if few servers are connected (e.g. soon after starting the gateway)

Reported by: davidsarah Owned by: daira
Priority: major Milestone: 1.12.0
Component: code-frontend-magic-folder Version: 1.8.2
Keywords: drop-upload preservation docs error otf-magic-folder-objective2 Cc:
Launchpad Bug:

Description (last modified by warner)

This is related to #719, but it may be a more significant problem for the drop-upload frontend because it starts monitoring the directory immediately.

In the latest #1429 patch, the 'Operational Statistics' page shows the number of drop-uploads that have failed, and there may be information about those failures in logs, but there is no other indication to the user that changed files have not been successfully uploaded.

Change History (24)

comment:1 Changed at 2011-07-28T03:43:47Z by davidsarah

  • Description modified (diff)
  • Keywords error added
  • Summary changed from drop-upload: files may not be uploaded with sufficient diversity if few servers are connected (e.g. soon after starting the gateway) to drop-upload: updates to files may be lost if few servers are connected (e.g. soon after starting the gateway)

comment:2 Changed at 2014-12-02T19:47:23Z by warner

  • Component changed from code-frontend to code-frontend-drop-upload
  • Description modified (diff)

comment:3 Changed at 2015-03-13T13:09:48Z by daira

  • Keywords otf-magic-folder added

comment:4 Changed at 2015-03-17T22:15:11Z by daira

  • Owner changed from davidsarah to daira
  • Status changed from new to assigned

comment:5 Changed at 2015-04-02T14:55:36Z by daira

  • Keywords otf-magic-folder-objective2 added; otf-magic-folder removed

comment:6 Changed at 2015-04-10T03:47:10Z by dawuud

I've got a very rough draft untested solution to this ticket right here: https://github.com/david415/tahoe-lafs/tree/david-1449

comment:8 Changed at 2015-04-10T20:49:31Z by dawuud

my current state is now: commit a6708d07e7a54b0c44ae51602f40b4919ca834fe of branch https://github.com/david415/tahoe-lafs/tree/david-1449

the client unit tests now pass. i've removed the storage client test.

the drop upload test fails... hangs forever.

comment:9 Changed at 2015-04-10T23:02:59Z by dawuud

in more recent commits i got more unit tests to pass...

however, having trouble getting the drop upload test to pass, still.

comment:10 Changed at 2015-04-12T22:34:18Z by daira

  • Milestone changed from soon to 1.11.0

comment:11 Changed at 2015-04-13T23:38:26Z by dawuud

ok i pushed my latest changes to here: https://github.com/david415/tahoe-lafs/tree/david-1449

i wasn't able to get the drop uploader unit tests passing so i just fixed naming convention usage like Daira mentioned earlier.

comment:12 Changed at 2015-04-14T06:20:09Z by dawuud

fix it with a deque! same branch. please review.

comment:13 Changed at 2015-04-14T16:59:38Z by daira

On #tahoe-lafs:

daira: dawuud: the current code in DropUploader._notify will (in the path not in self._pending branch) call _append_to_deque which adds the path to self._pending, then process the deque (synchronously), then add the path to self._pending again daira: the second self._pending.add(path) is wrong and should be deleted daira: processing the deque synchronously also may cause problems daira: it may be the change to synchronous processing that made the tests work, but I think we probably have to change it back to asynchronous daira: in particular, note that the deferred that is returned by _process is dropped by the call to func(*fields[1:]) in _process_deque daira: so this code will try to upload things in parallel... daira: which may work for immutable files, but is a bad idea for mutables, especially directories daira: I'll rebase the code as it is, anyway, so that we can review it more easily

comment:14 Changed at 2015-04-14T17:05:00Z by daira

-            return self.uploader.startService()
+            self.uploader.setServiceParent(self.client)
+            self.uploader.startService()
+            self.uploader.upload_ready()
+            return None

self.uploader.startService() returns a deferred which is dropped here. (Maybe it should be synchronous, the Twisted API doc is not clear.)

comment:15 Changed at 2015-04-14T17:05:51Z by daira

I really want linear types for deferreds, so they can't be dropped implicitly!

comment:17 Changed at 2015-04-14T19:17:11Z by dawuud

new working code here -> https://github.com/david415/tahoe-lafs/tree/1449.dropupload-redundant-uploads.2

  • i fixed the add-to-pending-bug
  • perform uploads serially (let's optimize later!)
  • push and pop the deque asynchronously (is that the correct term?) This design is highly influenced by Foolscap's eventually...

comment:18 Changed at 2015-04-17T20:11:18Z by dawuud

I just designed another uploader deque. Here: https://github.com/david415/tahoe-lafs/tree/2406.otf-objective-2.3.1-fix-upload-deque

I believe this to be a correct design that enforces sequential uploads and asynchronous deque appends... without the weird concurrent interleave bugs of my sloppy previous attempts.

comment:19 Changed at 2015-04-23T01:24:52Z by dawuud

I think this: https://github.com/david415/tahoe-lafs/blob/2406.otf-objective-2.3.1-fix-upload-deque/src/allmydata/client.py#L349-L353

is bad because it may cause unbalanced share allocation to storage servers. It seems likely that only connecting to K or H+1 servers would caused a single file's shares to be clustered on a smaller number of servers... meaning that some individual servers will get more than one share from that same file. This is bad... especially given that we do not yet have a "rebalancing" commandline tool of any kind.

comment:20 Changed at 2015-04-27T16:08:01Z by daira

I'm happy with the min(N, H+1) heuristic for now; we can reconsider this with Zooko and Brian's input before we merge to trunk.

Last edited at 2015-04-27T18:10:50Z by daira (previous) (diff)

comment:21 Changed at 2015-05-02T16:42:42Z by daira

  • Resolution set to fixed
  • Status changed from assigned to closed

comment:22 Changed at 2015-05-02T16:44:11Z by daira

Closing this and using ticket #2406 for any further review comments.

comment:23 Changed at 2016-03-22T05:02:52Z by warner

  • Milestone changed from 1.11.0 to 1.12.0

Milestone renamed

comment:24 Changed at 2016-04-26T19:51:59Z by meejah <meejah@…>

In a56a3ad/trunk:

Teach StorageFarmBroker? to fire a deferred when a connection threshold is reached. refs #1449

Signed-off-by: Daira Hopwood <daira@…>

Note: See TracTickets for help on using tickets.