#1525 closed defect (fixed)

SFTP: handle download failures correctly; remove use of IFinishableConsumer

Reported by: davidsarah Owned by: warner
Priority: critical Milestone: 1.10.0
Component: code-frontend Version: 1.9.0a1
Keywords: sftp hang error streaming performance blocker Cc:
Launchpad Bug:

Description

Brian noticed that a Deferred was being dropped at src/allmydata/frontends/sftpd.py@5179#L676. (Although this code changed when MDMF landed a87fc8ffab55dfd2, the immutable path of the previous code also dropped a Deferred, at line 680.)

It is incorrect to drop the Deferred, because that will cause SFTP read requests to hang if downloading the previous file contents fails. However fixing it while still allowing streaming reads is a bit tricky. The intention is that we download the file in parallel with accepting SFTP requests, and block each read request until the relevant part of the file is available (and prior writes have completed). If the download fails, then any SFTP read requests past the prefix that was read correctly should also fail. Also, if the file was written and not truncated, the SFTP close request should fail, at least in the case when the new contents depend on the part of the old contents that we weren't able to download.

Attachments (3)

fix-1525.darcs.patch (62.6 KB) - added by davidsarah at 2011-09-04T02:03:57Z.
SFTP: make sure that download failures are not dropped. fixes #1525
broken-tests-for-1525.darcs.patch (56.5 KB) - added by davidsarah at 2011-09-04T05:01:41Z.
SFTP: tests for ref #1525, but test_openFile_write is failing depending on the order of previous tests :-( Can anyone help debugging this?
more-deterministically-wrong.darcs.patch (75.7 KB) - added by davidsarah at 2011-09-05T05:46:26Z.
Version of broken test that is more deterministic (allmydata.test.test_sftp2)

Download all attachments as: .zip

Change History (22)

Changed at 2011-09-04T02:03:57Z by davidsarah

SFTP: make sure that download failures are not dropped. fixes #1525

Changed at 2011-09-04T05:01:41Z by davidsarah

SFTP: tests for ref #1525, but test_openFile_write is failing depending on the order of previous tests :-( Can anyone help debugging this?

comment:1 Changed at 2011-09-04T05:07:01Z by davidsarah

In broken-tests-for-1525.darcs.patch, there seems to be a dependency on the test order:

davidsarah@shinier:~/tahoe/1.9alpha$ bin/tahoe debug trial --rterror allmydata.test.test_sftp.Handler.test_openFile_write allmydata.test.test_sftp.Handler.test_openFile_read allmydata.test.test_sftp.Handler.test_openDirectory_and_attrs
allmydata.test.test_sftp
  Handler
    test_openFile_write ...                                                [OK]
    test_openFile_read ...                                                 [OK]
    test_openDirectory_and_attrs ...                                       [OK]

-------------------------------------------------------------------------------
Ran 3 tests in 8.720s

PASSED (successes=3)
davidsarah@shinier:~/tahoe/1.9alpha$ bin/tahoe debug trial --rterror allmydata.test.test_sftp
allmydata.test.test_sftp
  Handler
    test_basic ...                                                         [OK]
    test_convert_error ...                                                 [OK]
    test_execCommand_and_openShell ...                                     [OK]
    test_extendedRequest ...                                               [OK]
    test_makeDirectory ...                                                 [OK]
    test_not_implemented ...                                               [OK]
    test_openDirectory_and_attrs ...                                       [OK]
    test_openFile_read ...                                                 [OK]
    test_openFile_write ... Traceback (most recent call last):
Failure: allmydata.interfaces.NoSharesError: no shares (need 3). Last failure: [Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[Failure instance: Traceback: <type 'exceptions.OSError'>: [Errno 2] No such file or directory: 'sftp/openFile_read/servers/7r4gd6xu/storage/shares/hc/hcatcbxpnrcch37ihub4vi2shy/3'
/usr/lib/python2.6/dist-packages/twisted/internet/base.py:796:runUntilCurrent
/home/davidsarah/tahoe/1.9alpha/support/lib/python2.6/site-packages/foolscap-0.6.1-py2.6.egg/foolscap/eventual.py:26:_turn
/usr/lib/python2.6/dist-packages/twisted/internet/defer.py:318:callback
/usr/lib/python2.6/dist-packages/twisted/internet/defer.py:424:_startRunCallbacks
--- <exception caught here> ---
/usr/lib/python2.6/dist-packages/twisted/internet/defer.py:441:_runCallbacks
/home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:83:<lambda>
/home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:80:_call
/home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:66:_really_call
/home/davidsarah/tahoe/1.9alpha/src/allmydata/storage/immutable.py:314:remote_read
/home/davidsarah/tahoe/1.9alpha/src/allmydata/storage/immutable.py:89:read_share_data
/usr/lib/python2.6/genericpath.py:49:getsize
]'>
]
[ERROR]
    test_removeDirectory ...                                               [OK]
    test_removeFile ...                                                    [OK]
    test_renameFile ...                                                    [OK]
    test_renameFile_posix ...                                              [OK]

===============================================================================
[ERROR]: allmydata.test.test_sftp.Handler.test_openFile_write

Traceback (most recent call last):
Failure: allmydata.interfaces.NoSharesError: no shares (need 3). Last failure: [Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[Failure instance: Traceback: <type 'exceptions.OSError'>: [Errno 2] No such file or directory: 'sftp/openFile_read/servers/7r4gd6xu/storage/shares/hc/hcatcbxpnrcch37ihub4vi2shy/3'
/usr/lib/python2.6/dist-packages/twisted/internet/base.py:796:runUntilCurrent
/home/davidsarah/tahoe/1.9alpha/support/lib/python2.6/site-packages/foolscap-0.6.1-py2.6.egg/foolscap/eventual.py:26:_turn
/usr/lib/python2.6/dist-packages/twisted/internet/defer.py:318:callback
/usr/lib/python2.6/dist-packages/twisted/internet/defer.py:424:_startRunCallbacks
--- <exception caught here> ---
/usr/lib/python2.6/dist-packages/twisted/internet/defer.py:441:_runCallbacks
/home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:83:<lambda>
/home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:80:_call
/home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:66:_really_call
/home/davidsarah/tahoe/1.9alpha/src/allmydata/storage/immutable.py:314:remote_read
/home/davidsarah/tahoe/1.9alpha/src/allmydata/storage/immutable.py:89:read_share_data
/usr/lib/python2.6/genericpath.py:49:getsize
]'>
]
-------------------------------------------------------------------------------
Ran 13 tests in 20.290s

FAILED (errors=1, successes=12)

Notice that the error in test_openfile_write refers to the directory 'sftp/openFile_read/servers/7r4gd6xu/storage/shares/hc/hcatcbxpnrcch37ihub4vi2shy/3' being missing, but that is the wrong directory -- all the basedirs in each test should be unique, and the ones in test_openFile_write should start with sftp/openFile_write. Can anyone see what is going on?

comment:2 Changed at 2011-09-04T05:08:29Z by davidsarah

  • Keywords review-needed added; test-needed removed
  • Owner davidsarah deleted

Changed at 2011-09-05T05:46:26Z by davidsarah

Version of broken test that is more deterministic (allmydata.test.test_sftp2)

comment:3 Changed at 2011-10-17T00:07:14Z by warner

note: this will also fix the deprecation warning seen in #1564

comment:4 Changed at 2011-12-31T19:10:41Z by zooko

  • Owner set to davidsarah

This ticket is marked review-needed but the comments make it sound like it isn't ready for review because some of the tests are wrong. If that's true, please remove review-needed.

comment:5 Changed at 2011-12-31T20:23:05Z by davidsarah

  • Keywords review-needed removed
  • Status changed from new to assigned

The tests are indeed wrong.

comment:6 Changed at 2012-04-01T00:03:03Z by davidsarah

  • Milestone changed from soon to 1.10.0

comment:7 Changed at 2012-12-20T17:43:42Z by warner

On this morning's dev call, we agreed that if the tests can be made to pass soon (in the next week), we can land this, else we'll kick it out to 1.11. davidsarah is still on the hook to fix them.

comment:8 Changed at 2013-01-08T03:00:54Z by davidsarah

  • Milestone changed from 1.10.0 to 1.11.0

The current work in progress on this ticket, not including tests, is at https://github.com/davidsarah/tahoe-lafs/commits/1525-sftp-failures. Currently the SFTP tests hang at test_openFile_read, and it doesn't look as though I'll have time to fix that for 1.10, so I'm bumping this to 1.11.

comment:9 Changed at 2013-02-20T21:59:08Z by davidsarah

  • Milestone changed from 1.11.0 to 1.10.0
  • Summary changed from SFTP: handle download failures correctly to SFTP: handle download failures correctly; remove use of IFinishableConsumer

I'm bringing this back into the 1.10 milestone, because (as mentioned in #1295), Twisted 12.3.0 has removed IFinishableConsumer. This patch removes use of that interface.

comment:10 follow-up: Changed at 2013-02-25T18:19:09Z by zooko

So the next step is for davidsarah to fix the unit test?

comment:11 in reply to: ↑ 10 Changed at 2013-02-25T22:26:47Z by davidsarah

Replying to zooko:

So the next step is for davidsarah to fix the unit test?

Yes (and rebase the patch).

comment:12 Changed at 2013-03-07T16:37:19Z by zooko

  • Keywords blocker added

The import error part of this (formerly ticket #1926) is a blocker for 1.10.

Now, *maybe* we should re-open #1926 and fix it in the easy way (by constraining the Twisted version to not allow the latest Twisted). But I don't think so. I'd rather fix this ticket, #1525.

comment:13 Changed at 2013-03-10T23:56:03Z by davidsarah

I'm working on this now.

comment:14 Changed at 2013-03-19T02:43:21Z by davidsarah

  • Keywords review-needed added
  • Status changed from assigned to new
Last edited at 2013-03-19T05:52:41Z by davidsarah (previous) (diff)

comment:15 Changed at 2013-03-19T05:51:47Z by davidsarah

Committed in 48a2989e and 50c65629. Still needs review.

comment:16 Changed at 2013-03-20T21:31:20Z by warner

  • Priority changed from major to critical

marking as "critical" to indicate this is a 1.10 blocker. All it needs is someone (probably me) to review-after-the-fact.

comment:17 Changed at 2013-04-04T16:47:28Z by warner

  • Owner changed from davidsarah to warner

comment:18 follow-up: Changed at 2013-04-15T05:38:12Z by warner

  • Keywords review-needed removed
  • Resolution set to fixed
  • Status changed from new to closed

I spent a couple of hours on the plane studying the SFTP code. Wow, it's complex. It may need to be that complex, but it sure was hard to get my head around the control flows. I've got a list of code-cleanup suggestions, but I suspect most of them were applicable to the pre-#1525 code as well, so don't really qualify as review for this ticket. I think I'll open a separate ticket to discuss them. If Daira's manual and automated testing shows these changes to have worked correctly, I'm +1 on them too.

comment:19 in reply to: ↑ 18 Changed at 2013-04-20T23:35:25Z by daira

Replying to warner:

I've got a list of code-cleanup suggestions, but I suspect most of them were applicable to the pre-#1525 code as well, so don't really qualify as review for this ticket. I think I'll open a separate ticket to discuss them.

Please do, while they are fresh in your mind!

Note: See TracTickets for help on using tickets.