#1525 closed defect (fixed)
SFTP: handle download failures correctly; remove use of IFinishableConsumer
Reported by: | davidsarah | Owned by: | warner |
---|---|---|---|
Priority: | critical | Milestone: | 1.10.0 |
Component: | code-frontend | Version: | 1.9.0a1 |
Keywords: | sftp hang error streaming performance blocker | Cc: | |
Launchpad Bug: |
Description
Brian noticed that a Deferred was being dropped at src/allmydata/frontends/sftpd.py@5179#L676. (Although this code changed when MDMF landed a87fc8ffab55dfd2, the immutable path of the previous code also dropped a Deferred, at line 680.)
It is incorrect to drop the Deferred, because that will cause SFTP read requests to hang if downloading the previous file contents fails. However fixing it while still allowing streaming reads is a bit tricky. The intention is that we download the file in parallel with accepting SFTP requests, and block each read request until the relevant part of the file is available (and prior writes have completed). If the download fails, then any SFTP read requests past the prefix that was read correctly should also fail. Also, if the file was written and not truncated, the SFTP close request should fail, at least in the case when the new contents depend on the part of the old contents that we weren't able to download.
Attachments (3)
Change History (22)
Changed at 2011-09-04T02:03:57Z by davidsarah
Changed at 2011-09-04T05:01:41Z by davidsarah
SFTP: tests for ref #1525, but test_openFile_write is failing depending on the order of previous tests :-( Can anyone help debugging this?
comment:1 Changed at 2011-09-04T05:07:01Z by davidsarah
In broken-tests-for-1525.darcs.patch, there seems to be a dependency on the test order:
davidsarah@shinier:~/tahoe/1.9alpha$ bin/tahoe debug trial --rterror allmydata.test.test_sftp.Handler.test_openFile_write allmydata.test.test_sftp.Handler.test_openFile_read allmydata.test.test_sftp.Handler.test_openDirectory_and_attrs allmydata.test.test_sftp Handler test_openFile_write ... [OK] test_openFile_read ... [OK] test_openDirectory_and_attrs ... [OK] ------------------------------------------------------------------------------- Ran 3 tests in 8.720s PASSED (successes=3) davidsarah@shinier:~/tahoe/1.9alpha$ bin/tahoe debug trial --rterror allmydata.test.test_sftp allmydata.test.test_sftp Handler test_basic ... [OK] test_convert_error ... [OK] test_execCommand_and_openShell ... [OK] test_extendedRequest ... [OK] test_makeDirectory ... [OK] test_not_implemented ... [OK] test_openDirectory_and_attrs ... [OK] test_openFile_read ... [OK] test_openFile_write ... Traceback (most recent call last): Failure: allmydata.interfaces.NoSharesError: no shares (need 3). Last failure: [Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[Failure instance: Traceback: <type 'exceptions.OSError'>: [Errno 2] No such file or directory: 'sftp/openFile_read/servers/7r4gd6xu/storage/shares/hc/hcatcbxpnrcch37ihub4vi2shy/3' /usr/lib/python2.6/dist-packages/twisted/internet/base.py:796:runUntilCurrent /home/davidsarah/tahoe/1.9alpha/support/lib/python2.6/site-packages/foolscap-0.6.1-py2.6.egg/foolscap/eventual.py:26:_turn /usr/lib/python2.6/dist-packages/twisted/internet/defer.py:318:callback /usr/lib/python2.6/dist-packages/twisted/internet/defer.py:424:_startRunCallbacks --- <exception caught here> --- /usr/lib/python2.6/dist-packages/twisted/internet/defer.py:441:_runCallbacks /home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:83:<lambda> /home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:80:_call /home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:66:_really_call /home/davidsarah/tahoe/1.9alpha/src/allmydata/storage/immutable.py:314:remote_read /home/davidsarah/tahoe/1.9alpha/src/allmydata/storage/immutable.py:89:read_share_data /usr/lib/python2.6/genericpath.py:49:getsize ]'> ] [ERROR] test_removeDirectory ... [OK] test_removeFile ... [OK] test_renameFile ... [OK] test_renameFile_posix ... [OK] =============================================================================== [ERROR]: allmydata.test.test_sftp.Handler.test_openFile_write Traceback (most recent call last): Failure: allmydata.interfaces.NoSharesError: no shares (need 3). Last failure: [Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[Failure instance: Traceback: <type 'exceptions.OSError'>: [Errno 2] No such file or directory: 'sftp/openFile_read/servers/7r4gd6xu/storage/shares/hc/hcatcbxpnrcch37ihub4vi2shy/3' /usr/lib/python2.6/dist-packages/twisted/internet/base.py:796:runUntilCurrent /home/davidsarah/tahoe/1.9alpha/support/lib/python2.6/site-packages/foolscap-0.6.1-py2.6.egg/foolscap/eventual.py:26:_turn /usr/lib/python2.6/dist-packages/twisted/internet/defer.py:318:callback /usr/lib/python2.6/dist-packages/twisted/internet/defer.py:424:_startRunCallbacks --- <exception caught here> --- /usr/lib/python2.6/dist-packages/twisted/internet/defer.py:441:_runCallbacks /home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:83:<lambda> /home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:80:_call /home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:66:_really_call /home/davidsarah/tahoe/1.9alpha/src/allmydata/storage/immutable.py:314:remote_read /home/davidsarah/tahoe/1.9alpha/src/allmydata/storage/immutable.py:89:read_share_data /usr/lib/python2.6/genericpath.py:49:getsize ]'> ] ------------------------------------------------------------------------------- Ran 13 tests in 20.290s FAILED (errors=1, successes=12)
Notice that the error in test_openfile_write refers to the directory 'sftp/openFile_read/servers/7r4gd6xu/storage/shares/hc/hcatcbxpnrcch37ihub4vi2shy/3' being missing, but that is the wrong directory -- all the basedirs in each test should be unique, and the ones in test_openFile_write should start with sftp/openFile_write. Can anyone see what is going on?
comment:2 Changed at 2011-09-04T05:08:29Z by davidsarah
- Keywords review-needed added; test-needed removed
- Owner davidsarah deleted
Changed at 2011-09-05T05:46:26Z by davidsarah
Version of broken test that is more deterministic (allmydata.test.test_sftp2)
comment:3 Changed at 2011-10-17T00:07:14Z by warner
note: this will also fix the deprecation warning seen in #1564
comment:4 Changed at 2011-12-31T19:10:41Z by zooko
- Owner set to davidsarah
This ticket is marked review-needed but the comments make it sound like it isn't ready for review because some of the tests are wrong. If that's true, please remove review-needed.
comment:5 Changed at 2011-12-31T20:23:05Z by davidsarah
- Keywords review-needed removed
- Status changed from new to assigned
The tests are indeed wrong.
comment:6 Changed at 2012-04-01T00:03:03Z by davidsarah
- Milestone changed from soon to 1.10.0
comment:7 Changed at 2012-12-20T17:43:42Z by warner
On this morning's dev call, we agreed that if the tests can be made to pass soon (in the next week), we can land this, else we'll kick it out to 1.11. davidsarah is still on the hook to fix them.
comment:8 Changed at 2013-01-08T03:00:54Z by davidsarah
- Milestone changed from 1.10.0 to 1.11.0
The current work in progress on this ticket, not including tests, is at https://github.com/davidsarah/tahoe-lafs/commits/1525-sftp-failures. Currently the SFTP tests hang at test_openFile_read, and it doesn't look as though I'll have time to fix that for 1.10, so I'm bumping this to 1.11.
comment:9 Changed at 2013-02-20T21:59:08Z by davidsarah
- Milestone changed from 1.11.0 to 1.10.0
- Summary changed from SFTP: handle download failures correctly to SFTP: handle download failures correctly; remove use of IFinishableConsumer
I'm bringing this back into the 1.10 milestone, because (as mentioned in #1295), Twisted 12.3.0 has removed IFinishableConsumer. This patch removes use of that interface.
comment:10 follow-up: ↓ 11 Changed at 2013-02-25T18:19:09Z by zooko
So the next step is for davidsarah to fix the unit test?
comment:11 in reply to: ↑ 10 Changed at 2013-02-25T22:26:47Z by davidsarah
Replying to zooko:
So the next step is for davidsarah to fix the unit test?
Yes (and rebase the patch).
comment:12 Changed at 2013-03-07T16:37:19Z by zooko
- Keywords blocker added
comment:13 Changed at 2013-03-10T23:56:03Z by davidsarah
I'm working on this now.
comment:14 Changed at 2013-03-19T02:43:21Z by davidsarah
- Keywords review-needed added
- Status changed from assigned to new
Review needed for https://github.com/tahoe-lafs/tahoe-lafs/pull/33.
comment:15 Changed at 2013-03-19T05:51:47Z by davidsarah
comment:16 Changed at 2013-03-20T21:31:20Z by warner
- Priority changed from major to critical
marking as "critical" to indicate this is a 1.10 blocker. All it needs is someone (probably me) to review-after-the-fact.
comment:17 Changed at 2013-04-04T16:47:28Z by warner
- Owner changed from davidsarah to warner
comment:18 follow-up: ↓ 19 Changed at 2013-04-15T05:38:12Z by warner
- Keywords review-needed removed
- Resolution set to fixed
- Status changed from new to closed
I spent a couple of hours on the plane studying the SFTP code. Wow, it's complex. It may need to be that complex, but it sure was hard to get my head around the control flows. I've got a list of code-cleanup suggestions, but I suspect most of them were applicable to the pre-#1525 code as well, so don't really qualify as review for this ticket. I think I'll open a separate ticket to discuss them. If Daira's manual and automated testing shows these changes to have worked correctly, I'm +1 on them too.
SFTP: make sure that download failures are not dropped. fixes #1525