Opened at 2013-05-23T16:34:10Z
Last modified at 2014-12-02T19:50:10Z
#1976 assigned defect
SFTP+SSHFS hangs for second concurrent operation
Reported by: | luckyredhot | Owned by: | daira |
---|---|---|---|
Priority: | normal | Milestone: | undecided |
Component: | code-frontend-ftp-sftp | Version: | 1.10.0 |
Keywords: | sftp sshfs hang reliability | Cc: | |
Launchpad Bug: |
Description (last modified by daira)
I am using Tahoe-Lafs FTPS frontend with SSHFS on Ubuntu 12.04. If I try to run second operation (simply "ls" or "du") while first writing is running, second one can completely hang sometimes. It does not even stops on sending SIGKILL so I need to kill parent bash session.
Tahoe-LAFS versions 1.9.2 and 1.10.0 are both affected.
SSHFS mount options:
sshfs -p 8022 -o uid=33 -o gid=33 -o nonempty -o allow_other -o idmap=user tahoe@127.0.0.1:/ /mnt/tahoe
If this is SFTP issue it should be fixed. If this is SSHFS issue then probably we have to find other client or some workaround (probably 2 sshfs mounts - for writing and for reading).
Any help is appreciated :) Please also suggest on commands which I may run when issue occurs to gather some debug information.
Thanks!
Attachments (3)
Change History (13)
Changed at 2013-05-23T16:36:21Z by luckyredhot
comment:1 Changed at 2013-05-23T16:44:58Z by daira
- Description modified (diff)
- Keywords sftp hang reliability added; ftps removed
- Owner set to daira
- Status changed from new to assigned
- Summary changed from FTPS+SSHFS hangs for second operation to SFTP+SSHFS hangs for second concurrent operation
To get debugging output from sshfs, restart it in the foreground with options:
-o debug,sshfs_debug,loglevel=debug
To get debugging output from the gateway, see the Realtime Logging section of docs/logging.rst.
comment:2 Changed at 2013-06-11T09:20:34Z by luckyredhot
Ok, I've catched an issue. It happens when
- One write operation is in progress (I am constantly copying files to grid folder)
- Second operation tries to get listing/attributes. It usually happens not from the first time, but consequently running "ls" command causes all operations to freeze for long span. In my case I've got only 5 files in folder, but "ls" operation took 40 (!) seconds. It will last forever on hundreds of files.
See attached logs. I've issued ls before [80576] LSTAT
Changed at 2013-06-11T09:22:22Z by luckyredhot
comment:3 Changed at 2013-06-12T05:39:12Z by zooko
Thanks for the bug report, luckyredhot! Is there any incident report file generated by the LAFS gateway when this happens? If not, could you force it to generate one? See wiki:HowToReportABug for instructions.
comment:4 Changed at 2013-06-14T15:07:12Z by luckyredhot
Incident file has been attached. Hope it'll be helpful.
comment:5 Changed at 2013-06-21T07:40:15Z by luckyredhot
Daira, thanks for yesterday's analysis. What are the following steps we can make? Probably I may try to raise issue one more time to get additional logs? Or you think upgrading to 1.10 may also be helpful? (AFAIK SFTP wasn't modified there from 1.9.2).
comment:6 Changed at 2013-06-21T09:32:28Z by daira
SFTP was actually modified in 1.10 to improve error handling; I doubt it affects this bug, but it may help slightly in debugging. I'm going to try to reproduce the problem myself, but please feel free to attach another log, since the file incident-2013-06-11--12-22-26Z-oqcgkpa.flog.bz2 seems to be corrupted in some way.
It's unfortunate that the sshfs debug log doesn't include timestamps that could be correlated with the foolscap log.
comment:7 follow-up: ↓ 8 Changed at 2013-06-25T06:44:35Z by luckyredhot
What do you think if I perform partial Grid update (for example, upgrade 2 of existing 5 nodes to 1.10) and try to catch issue on both 1.9.2 and 1.10 nodes of the same Grid? Sound reasonable?
comment:8 in reply to: ↑ 7 Changed at 2013-06-25T15:16:33Z by zooko
Replying to luckyredhot:
What do you think if I perform partial Grid update (for example, upgrade 2 of existing 5 nodes to 1.10) and try to catch issue on both 1.9.2 and 1.10 nodes of the same Grid? Sound reasonable?
Dear Oleksandr:
I would assume that the storage servers have nothing to do with this bug. However, since I don't understand this bug, maybe my assumption is bad.
However, I suspect you'd get more better debugging information for your effort if you try different versions of Tahoe-LAFS for the gateway rather than the servers.
comment:9 Changed at 2013-06-26T00:23:26Z by daira
I agree with zooko that this is unlikely to be related to the storage server versions.
comment:10 Changed at 2014-12-02T19:50:10Z by warner
- Component changed from code-frontend to code-frontend-ftp-sftp
tahoe --version