Opened at 2015-10-15T11:01:29Z
Closed at 2016-03-21T17:24:35Z
#2537 closed defect (fixed)
magic-folder: implement download retry
Reported by: | dawuud | Owned by: | daira |
---|---|---|---|
Priority: | normal | Milestone: | undecided |
Component: | code-frontend-magic-folder | Version: | 1.10.1 |
Keywords: | magic-folder download downloader retry reliability blocks-merge | Cc: | |
Launchpad Bug: |
Description (last modified by dawuud)
The downloader should retry if the operation fails... this can happen because of lossy networks such as wifi causing the tcp connections to storage servers to drop. Foolscap handles reconnection with exponential backoff... however the application layer still needs to retry... otherwise the user will experience our software as brittle.
Change History (18)
comment:1 Changed at 2015-10-15T11:51:41Z by dawuud
comment:2 Changed at 2015-10-15T13:10:15Z by daira
Note that this is not part of the OTF grant (milestones 1-6).
comment:3 Changed at 2015-10-27T20:21:02Z by daira
See also #2420.
comment:4 Changed at 2016-01-18T21:10:18Z by dawuud
- Description modified (diff)
- Summary changed from magic-folder: implement download/upload retry to magic-folder: implement download retry
comment:5 Changed at 2016-01-18T21:12:43Z by dawuud
I've changed this ticket to only be concerned with retrying downloads because #2635 already deals with uploading file objets that previously failed to get uploaded.
comment:6 Changed at 2016-01-26T15:26:15Z by dawuud
the next step is to simply write unit tests for download failure and retry. that is... we should already be retrying the download the next time we perform a remote dmd scan.
comment:7 Changed at 2016-01-27T14:15:53Z by dawuud
i tried writing a test... although i'm still stuck on the part where i create error... my test hangs... probably i've used the hook mixin deferred with incorrect ordering or something like that.
https://github.com/david415/tahoe-lafs/tree/2537.download-retry.0
comment:8 Changed at 2016-01-27T20:17:40Z by dawuud
hey i made slight progress here... and got Alice and Bob sharing a file... and then after that I had some trouble producing a grid failure for Bob's download of the file that Alice just uploaded. We want to delete the shares, corrupt the share, or shutdown the storage servers such that it causes the download to fail.
https://github.com/david415/tahoe-lafs/tree/2537.download-retry.0
comment:9 follow-up: ↓ 10 Changed at 2016-02-08T14:00:07Z by dawuud
here's a pull request available for a casual code review: https://github.com/tahoe-lafs/tahoe-lafs/pull/238/files
although it doesn't at the moment work or pass.
comment:10 in reply to: ↑ 9 Changed at 2016-02-08T16:09:03Z by daira
Replying to dawuud:
here's a pull request available for a casual code review: https://github.com/tahoe-lafs/tahoe-lafs/pull/238/files
although it doesn't at the moment work or pass.
There was a shallow bug (assuming that d.addCallback returns self), and also the branch included too much history from a previous magic-folder-stable branch, so I closed the PR. @dawuud will push a new 2537.download-retry.2 branch.
comment:11 Changed at 2016-02-09T11:19:48Z by dawuud
i'm still not able to get the test to pass https://github.com/david415/tahoe-lafs/tree/2537.download-retry.2
it's true that we fixed a shallow bug... but there were other problems with the test. today i've experimented a bit with rewriting it. basically we want a test wherein:
- alice uploads a file
- grid failure
- bob fails to download file
- bob later rescans and successfully downloads file
comment:12 Changed at 2016-02-09T14:15:26Z by dawuud
- Keywords magic-folder added
comment:13 Changed at 2016-02-10T20:39:17Z by dawuud
here's the resulting dev branch from our pairing session: https://github.com/david415/tahoe-lafs/tree/2537.download-retry.3
the test works but daira pointed out that there is likely a race condition and we should investigate a better way to make the download fail...
comment:14 Changed at 2016-02-10T20:39:50Z by dawuud
- Keywords download downloader retry reliability added
comment:15 Changed at 2016-02-18T16:07:04Z by daira
Current branch is https://github.com/tahoe-lafs/tahoe-lafs/tree/2537.download-retry.4
comment:16 Changed at 2016-03-16T10:55:37Z by dawuud
this unit test and feature now work properly thanks to Daira and Meejah. here's the latest: https://github.com/tahoe-lafs/tahoe-lafs/commits/2537.download-retry.5
I presume we shall not close this ticket until we have merged Meejah's inline-callback-deque changeset into a stable branch. Once we've done that we can merge in this unit test and close this ticket as well as https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2412
comment:17 Changed at 2016-03-21T15:26:54Z by daira
- Keywords blocks-merge added
comment:18 Changed at 2016-03-21T17:24:35Z by daira
- Resolution set to fixed
- Status changed from new to closed
Fixed on 2438.magic-folder-stable.12.
i started to write an experimental test for the upload retry here in this dev branch: https://github.com/david415/tahoe-lafs/tree/2537.upload-download-retry.0
hmm but maybe the nonetwork grid isn't appropriate for testing connectivity retry?