wiki:SftpFrontend

Version 60 (modified by davidsarah, at 2011-01-08T05:41:18Z) (diff)

openssh/sshfs options

The SFTP frontend is a server that optionally runs as part of a gateway node, and provides read/write access to the Tahoe grid via the SSH File Transfer Protocol.

See docs/frontends/FTP-and-SFTP.rst (.txt in releases before v1.8.1) for how to enable and set up the SFTP frontend on a gateway. This page is for compatibility issues with particular SFTP clients, and assumes that you are using Tahoe-LAFS v1.7.0 or later. Please add any more issues that you discover.

Security

The security of the connection between the SFTP client and gateway is dependent on the PyCrypto library, which has not been reviewed to the same extent as the pycryptopp library that we use elsewhere in Tahoe-LAFS. In particular, the AES implementation in PyCrypto is known to be vulnerable to timing attacks which could potentially, depending on the situation, allow a remote attacker to break the encryption protecting the SFTP connection between your SFTP client and the Tahoe-LAFS gateway process that is acting as SFTP server. Therefore we do not recommend that you rely on the confidentiality or authentication provided by this SSH connection in the current release.

In practice, that means you can run the Tahoe-LAFS gateway locally on the same machine as your SFTP client (which is a good, efficient, and secure solution), or tunnel your SFTP connection over another secure connection such as ssh tunnel or VPN, or else just accept the risk that someone could snoop on the data that you are sending and receiving over the SFTP connection.

Server keys with passphrases are not supported (#1039).

General compatibility issues

Before uploading a file to a Tahoe filesystem, the whole file has to be available. This means that the upload can only start when the file has been closed in the SFTP session. Particularly when writing large files, the client may time out between sending the close request and receiving the response (ticket #1041). This is known to be a problem for at least the WinSCP client, which has a default close timeout of 15 seconds. In the case of WinSCP this can be worked around by setting WinSCP -> Connection -> Timeouts to 6000 seconds (the maximum allowed); other clients with this problem may have similar settings.

In the period after the close but before the upload has finished, the closed file may not appear in directory listings, or may appear with an incorrect modification time.

Since Tahoe uses capability access control rather than Unix-style permissions, the permission bits seen by SFTP clients are only an approximation chosen to avoid confusing client programs. In particular the 'user', 'group' and 'world' permissions on a Tahoe file will always be the same. It is possible to clear all of the 'w' bits on a file, which will prevent that file from being opened for writing, but note that its directory entry can still be replaced via a write cap to the directory.

See the last section of docs/frontends/FTP-and-SFTP.rst for information on how the SFTP frontend treats immutable and mutable files.

The 'ctime' and 'mtime' attributes will always be the same, and are set from the Tahoe linkmotime timestamp, which is changed only when the link from the parent directory is modified (see the 'About the metadata' section of webapi.rst). These fields are not updated when the contents of a mutable file are changed. The SFTP protocol and the server are able to represent dates up to the year 2106, but some clients may print dates incorrectly after 2037.

Versions of Twisted up to and including 10.2 have a bug in support for rekeying. This might cause a hang or 100% CPU usage by the gateway when a client tries to rekey. Depending on the client, rekeying may be triggered based on a time interval or the amount of data sent (for example, 1 GiB to 4 GiB for the openssh client); this problem will typically only affect long-lived connections or very large files. Some clients have options to disable rekeying:

  • for OpenSSH and sshfs, either use the option -o RekeyLimit=0, or add the line RekeyLimit=0 to ~/.ssh/config (TODO: please test this!)
  • for WinSCP, see that section below.

Unicode filenames

The SFTP frontend encodes all filenames as UTF-8 when communicating with the client. Support for displaying and copying non-ASCII filenames is likely to vary between clients. If you are using a filesystem that represents names as UTF-8 (including via sshfs), then it should just work, but please report your experience with this.

Some clients fail to convert filenames to UTF-8, or require a configuration option to do so; see ticket #1089. In this case they will usually fail to create non-ASCII filenames (although there is a small chance that the name in another encoding will accidentally be decodable as UTF-8), and directory listings will show mojibake for non-ASCII names.

Filenames are normalized to NFC, which means that it is not possible to have two files/subdirectories with canonically equivalent names in the same directory. (This does not cause any incompatibility with filesystems that use a different normalization, such as NFD in Mac OS X.)

Performance

The SFTP frontend currently performs no caching (sshfs does cache, but only for 20 seconds with the default settings). Some applications assume that file operations have relatively low latency, and may have very poor performance when working directly with a Tahoe filesystem. In this case it may be better to copy files to a local filesystem and work on them there, then copy back any changes. Note that just browsing a directory may cause some apps to perform many unnecessary reads or attribute checks of files in that directory.

The -o big_writes option to sshfs may improve write performance.

Specific clients

sshfs

sshfs is an SFTP client that allows filesystem access via FUSE (a user-space filesystem layer). It works on Linux and other Unix systems that provide FUSE. For Mac OS X, a patched version of sshfs is included as part of MacFUSE.

Tahoe's SFTP frontend includes several workarounds and extensions to make it function correctly with sshfs.

Mutable parts of a filesystem should only be accessed via a single sshfs mount (this is a stronger restriction than the write coordination directive against writing mutable parts of a filesystem via more than one gateway). Data loss may result for concurrently accessed files if this restriction is not followed.

When writing a file to the Tahoe filesystem, sshfs does not wait for the 'close' request to complete before reporting to the application that the file has been successfully closed (#1059). Therefore, you should not shut down your gateway node immediately after writing files via sshfs, otherwise those files may be lost. It is possible that an upload could fail (due to a network error, lack of storage space, etc.); such failures will not be reported to applications using sshfs. This also implies that during the upload, a file could be visible via SFTP but not via the Tahoe WUI, CLI, or FTP frontends.

(This patch makes sshfs wait for close requests to complete, but may cause its own compatibility problems; the patch is provided only for testing purposes.)

Some applications may make assumptions that are incompatible with Tahoe. For example, 'flushing' a file does not guarantee that written data is reflected in the Tahoe filesystem, so opening the same file via another handle and attempting to read that data before the original handle is closed will not work.

If a file is written via two handles concurrently, the contents visible at any point in time will be the data written via one handle or the other (or the previous contents), or the read will fail. The result will not be an interleaving as would be the case for a POSIX filesystem. Also, the file contents obtained by a successful read via any handle will be a snapshot at about the time of the open. These differences from the POSIX semantics are arguably improvements (at least when the read succeeds), but in principle they could confuse some applications.

If a file in a mutable directory is closed concurrently with an operation that needs to read the directory, then the latter operation may fail (#1105).

The MacFUSE version of sshfs stores "extended attributes" in files with names starting with "._". For example the attributes for "foo.txt" would be stored in a file called "._foo.txt". Since some Mac OS X applications may depend on these attributes (especially for their own file formats), if you need to copy or move the original file then you should copy or move the attribute file along with it. The OS X cp and mv commands will do this by default; operations using the Tahoe WUI or CLI will not (unless you are moving all files in a directory). Note that filenames beginning with "." are not listed by default by ls.

On Mac OS X, TextEdit and vi are known to have problems editing files on a Tahoe-via-sshfs filesystem.

To avoid potential bugs with rekeying, add the line RekeyLimit 0 to ~/.ssh/config.

Gnome virtual filesystem (gvfs)

gvfs is a set of filesystem adapters provided with the Gnome window system. It can be used in two ways: either via the GIO API, or via a FUSE layer called gvfs-FUSE (not to be confused with sshfs).

Apps that use the GIO API, such as the Nautilus file browser, seem to work correctly with Tahoe.

gvfs-FUSE, on the other hand, is not recommended for use with Tahoe. This is because it has to map POSIX filesystem requests onto GIO requests, and this mapping loses information -- some combinations of 'open' flags cannot be expressed in the GIO API, for example. Therefore it is impossible for gvfs-FUSE to provide a fully correct FUSE filesystem (or even one that is "good enough" for many applications).

It may not be entirely clear to users whether a particular Gnome app is using GIO or gvfs-FUSE. Recent versions of OpenOffice use gvfs-FUSE when opening a file directly from an SFTP filesystem, and this may cause problems (although OpenOffice does appear to work when editing files on an sshfs filesystem).

WinSCP

In the WinSCP Login dialog, the following options need to be set (some require 'Advanced options' to be checked):

  • In the Environment section, set 'UTF-8 encoding for filenames' to 'On'.
  • In the Connection section, set 'Server response timeout' to the maximum 6000 seconds.
  • In the Key exchange section under SSH, set both 'Max minutes before rekey' and 'Max data before rekey' to 0.

Note that these options are not persistent unless you save them as a 'Stored session', together with the host name, username, etc.

Attachments (1)

Download all attachments as: .zip