#1041 assigned defect

Timeout error when uploading a file with some SFTP clients, e.g. WinSCP

Reported by: freestorm Owned by: davidsarah
Priority: major Milestone: undecided
Component: code-frontend-ftp-sftp Version: 1.6.1
Keywords: sftp winscp upload reliability timeout Cc:
Launchpad Bug:

Description (last modified by warner)

When uploading a file (random generate), the upload is stopping at 100% with the error:

Host is not communicating for more than 15 seconds. Still waiting... Warning. Aborting this operation will close connection!

Client:
WinSCP 4.2.7 (build 758)

OS:
Windows XP SP3 French

Change History (13)

comment:1 Changed at 2010-05-15T01:21:22Z by davidsarah

  • Component changed from unknown to code-frontend
  • Description modified (diff)
  • Keywords review-needed removed
  • Priority changed from critical to major

comment:2 Changed at 2010-05-15T01:32:31Z by davidsarah

  • Keywords upload reliability added

IRC discussion (slightly edited):

<FreeStorm> davidsarah: I'm testing with Winscp, he display an strange error message: The host has not responded for more than 15 seconds, still waiting [Cancel] [Help], (translated from French)

<FreeStorm> But: my SFTP node is on my machine, and the Introducer and Helper are in other location with VPN, so I need to test on the same LAN

<davidsarah> any bug that causes a hang would probably result in that message from WinSCP, but 15 secs is a fairly short timeout

<davidsarah> for example we can't guarantee that the latency of a 'close' request will be less than 15 secs

<FreeStorm> davidsarah: yes, I think so

<FreeStorm> davidsarah: it append near the end of transfer I think

<davidsarah> an upload?

<davidsarah> what size of file?

<FreeStorm> davidsarah: yes upload (all files are random)

<FreeStorm> 1 [Mbyte] => OK

<FreeStorm> 10 [Mbyte] and 50 [Mbyte] => restarting many times, and after it's okay

<davidsarah> ah

<davidsarah> so that does sound like 'close' latency

[...]

<davidsarah> that's irritating, because the only way we can shorten the close latency (without more extensive changes) is to return success from the 'close' before we know that the file has actually been uploaded

<davidsarah> does WinSCP have any way to configure that timeout? (I'm guessing not)

<davidsarah> unfortunately the SFTP protocol has no way to say, "yes I'm still doing that, be patient"

<FreeStorm> I'm looking into WinSCP configuration

<davidsarah> please open a ticket for this timeout problem

[...]

  • davidsarah thinks about how to solve that problem

<davidsarah> I think this will have to be a known limitation of SFTP using some clients, for v1.7

<FreeStorm> OK, I'm going to do the same test on LAN, maybe Internet Connexion errors

<davidsarah> we could have a config option to allow returning early success from the close, but I'm very reluctant to compromise on correctness/reliability here

comment:3 Changed at 2010-05-15T01:33:14Z by davidsarah

  • Owner changed from nobody to davidsarah
  • Status changed from new to assigned

comment:4 Changed at 2010-05-16T03:15:08Z by davidsarah

It's just possible that we may be able to fix this by sending keepalive packets on the connection. Whether this will work depends on whether the timeout is between a 'close' request and its response (in which case it won't help), or between any two SFTP packets.

comment:5 follow-up: Changed at 2010-05-16T16:05:03Z by zooko

Using a random encryption key instead of convergent encryption would solve this, right? Then the upload from Tahoe-LAFS gateway to storage servers (or helper) could proceed at the same time as the upload from SFTP client to Tahoe-LAFS gateway is proceeding. The cost would be that you lose convergence. We did a measurement on the allmydata.com customer base's files at one point. Unfortunately I don't recall precisely and a quick search hasn't turned up my published notes, but I think the estimated space savings from convergence for that set was less than 1%.

comment:6 in reply to: ↑ 5 Changed at 2010-05-16T16:51:27Z by davidsarah

Replying to zooko:

Using a random encryption key instead of convergent encryption would solve this, right?

Nope. The problem is that SFTP allows random access writes, so the SFTP client could write a large file, then go back and change the first byte just before the close.

It is possible to take advantage of streaming upload in the case where the file is opened with flags FXF_WRITE | FXF_APPEND (meaning that all writes will be at the end of file). Most clients don't use FXF_APPEND, though, even when they are going to write the file linearly.

#935 could be implemented in a way that fixes this problem: the 'close' would cause the file to be stored durably on the gateway, which would be responsible for uploading it to the grid asynchronously (even if the gateway crashes and restarts). That would be at the expense of a looser consistency model: a successful 'close' would only guarantee that the file is immediately visible via this gateway, not other gateways.

comment:7 follow-up: Changed at 2010-05-16T17:02:08Z by zooko

But in the common case that a client opens the file, writes the file in order from beginning to end, and closes the file (even though it doesn't give the FXF_APPEND flag), then using a random encryption key would make things work very well and using convergent encryption makes things fail, if the file is large enough, or become unreliable, if we do write-caching. Am I right?

I hope that in the long run we extend Tahoe-LAFS to support out-of-order writes of immutable files too, so that the case you described would also be cleanly supported.

comment:8 in reply to: ↑ 7 Changed at 2010-05-16T17:11:46Z by zooko

replying to myself:

Replying to zooko:

But in the common case that a client opens the file, writes the file in order from beginning to end, and closes the file (even though it doesn't give the FXF_APPEND flag), then using a random encryption key would make things work very well

No, this doesn't make sense. What would the Tahoe-LAFS gateway do if, after it had streamingly uploaded the file, then the SFTP client seeked back to the beginning and wrote something?

I hope that in the long run we extend Tahoe-LAFS to support out-of-order writes of immutable files too, so that the case you described would also be cleanly supported.

This might still make sense, but it requires more changes to the Tahoe-LAFS upload logic.

comment:9 Changed at 2010-05-25T22:17:38Z by davidsarah

In order to make the SFTP frontend work correctly with sshfs, we are planning to make the following changes (the first has already been done):

  • files can be renamed and deleted while there are handles to them.
  • if a file is closed and then reopened before the close has completed, then the open will be delayed until the close has completed.

This was necessary because sshfs returns success from a close call immediately after sending the FXP_CLOSE mesage, without waiting for a response from the SFTP server.

So we could now send the response to FXF_CLOSE immediately, without compromising consistency as viewed through the SFTP frontend. That would fix this bug, but with the following negative side-effects:

  • the upload might fail, in which case there would be no way to notify the client that it had failed.
  • the upload would not immediately be visible via non-SFTP frontends or other gateways.

comment:10 Changed at 2010-06-12T21:13:35Z by davidsarah

This problem is documented in wiki:SftpFrontend.

comment:11 Changed at 2010-06-13T19:06:26Z by slush

Simple workaround work for me: Set WinSCP->Connection->Timeouts to 6000 seconds (maximum allowed). I succesfully uploaded 185MB file thru 1Mbit line on first attempt.

To be honest, second test to large upload was also succesful, but WinSCP crashed immediately after upload finished :).

comment:12 Changed at 2011-06-28T17:51:53Z by davidsarah

  • Keywords timeout added
  • Summary changed from Error when uploading a file with WinSCP in SFTP to Timeout error when uploading a file with some SFTP clients, e.g. WinSCP

comment:13 Changed at 2014-12-02T19:42:20Z by warner

  • Component changed from code-frontend to code-frontend-ftp-sftp
  • Description modified (diff)
Note: See TracTickets for help on using tickets.