#2537 closed defect (fixed)

magic-folder: implement download retry

Reported by: dawuud Owned by: daira
Priority: normal Milestone: undecided
Component: code-frontend-magic-folder Version: 1.10.1
Keywords: magic-folder download downloader retry reliability blocks-merge Cc:
Launchpad Bug:

Description (last modified by dawuud)

The downloader should retry if the operation fails... this can happen because of lossy networks such as wifi causing the tcp connections to storage servers to drop. Foolscap handles reconnection with exponential backoff... however the application layer still needs to retry... otherwise the user will experience our software as brittle.

Change History (18)

comment:1 Changed at 2015-10-15T11:51:41Z by dawuud

i started to write an experimental test for the upload retry here in this dev branch: https://github.com/david415/tahoe-lafs/tree/2537.upload-download-retry.0

hmm but maybe the nonetwork grid isn't appropriate for testing connectivity retry?

comment:2 Changed at 2015-10-15T13:10:15Z by daira

Note that this is not part of the OTF grant (milestones 1-6).

comment:3 Changed at 2015-10-27T20:21:02Z by daira

See also #2420.

comment:4 Changed at 2016-01-18T21:10:18Z by dawuud

  • Description modified (diff)
  • Summary changed from magic-folder: implement download/upload retry to magic-folder: implement download retry

comment:5 Changed at 2016-01-18T21:12:43Z by dawuud

I've changed this ticket to only be concerned with retrying downloads because #2635 already deals with uploading file objets that previously failed to get uploaded.

Version 0, edited at 2016-01-18T21:12:43Z by dawuud (next)

comment:6 Changed at 2016-01-26T15:26:15Z by dawuud

the next step is to simply write unit tests for download failure and retry. that is... we should already be retrying the download the next time we perform a remote dmd scan.

comment:7 Changed at 2016-01-27T14:15:53Z by dawuud

i tried writing a test... although i'm still stuck on the part where i create error... my test hangs... probably i've used the hook mixin deferred with incorrect ordering or something like that.

https://github.com/david415/tahoe-lafs/tree/2537.download-retry.0

comment:8 Changed at 2016-01-27T20:17:40Z by dawuud

hey i made slight progress here... and got Alice and Bob sharing a file... and then after that I had some trouble producing a grid failure for Bob's download of the file that Alice just uploaded. We want to delete the shares, corrupt the share, or shutdown the storage servers such that it causes the download to fail.

https://github.com/david415/tahoe-lafs/tree/2537.download-retry.0

comment:9 follow-up: Changed at 2016-02-08T14:00:07Z by dawuud

here's a pull request available for a casual code review: https://github.com/tahoe-lafs/tahoe-lafs/pull/238/files

although it doesn't at the moment work or pass.

comment:10 in reply to: ↑ 9 Changed at 2016-02-08T16:09:03Z by daira

Replying to dawuud:

here's a pull request available for a casual code review: https://github.com/tahoe-lafs/tahoe-lafs/pull/238/files

although it doesn't at the moment work or pass.

There was a shallow bug (assuming that d.addCallback returns self), and also the branch included too much history from a previous magic-folder-stable branch, so I closed the PR. @dawuud will push a new 2537.download-retry.2 branch.

comment:11 Changed at 2016-02-09T11:19:48Z by dawuud

i'm still not able to get the test to pass https://github.com/david415/tahoe-lafs/tree/2537.download-retry.2

it's true that we fixed a shallow bug... but there were other problems with the test. today i've experimented a bit with rewriting it. basically we want a test wherein:

  • alice uploads a file
  • grid failure
  • bob fails to download file
  • bob later rescans and successfully downloads file

comment:12 Changed at 2016-02-09T14:15:26Z by dawuud

  • Keywords magic-folder added

comment:13 Changed at 2016-02-10T20:39:17Z by dawuud

here's the resulting dev branch from our pairing session: https://github.com/david415/tahoe-lafs/tree/2537.download-retry.3

the test works but daira pointed out that there is likely a race condition and we should investigate a better way to make the download fail...

comment:14 Changed at 2016-02-10T20:39:50Z by dawuud

  • Keywords download downloader retry reliability added

comment:16 Changed at 2016-03-16T10:55:37Z by dawuud

this unit test and feature now work properly thanks to Daira and Meejah. here's the latest: https://github.com/tahoe-lafs/tahoe-lafs/commits/2537.download-retry.5

I presume we shall not close this ticket until we have merged Meejah's inline-callback-deque changeset into a stable branch. Once we've done that we can merge in this unit test and close this ticket as well as https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2412

comment:17 Changed at 2016-03-21T15:26:54Z by daira

  • Keywords blocks-merge added

comment:18 Changed at 2016-03-21T17:24:35Z by daira

  • Resolution set to fixed
  • Status changed from new to closed

Fixed on 2438.magic-folder-stable.12.

Note: See TracTickets for help on using tickets.