#1089 closed defect (wontfix)

SFTP and FTP: support for non-UTF-8 charsets (error message "Path could not be decoded as UTF-8")

Reported by: slush Owned by: nobody
Priority: minor Milestone: eventually
Component: code-frontend Version: 1.7.0
Keywords: utf-8 unicode ftpd sftp names docs Cc: slush@…, amontero@…
Launchpad Bug:

Description (last modified by amontero)

I open new ticket, because I didn't found any reported problem with UTF-8 (except this one #704 which is afaik something different).

When I try to create file/directory over web frontend, everything goes well. But any attempt to create directory/file with national symbols over FTP/SFTP frontend fails.

WinSCP tell me "Path could not be decoded as UTF-8", Total Commander just tell me some general error.

Example directory name which fails: žluťoučký kůň úpěl ďábelské ódy (means: Yellow horse moaned devil odes), but fails also with any other reasonable name as "dovolená" (holiday).

Reproducibility: always Python version: 2.6

Change History (14)

comment:1 follow-up: Changed at 2010-06-17T20:22:12Z by slush

And when I create directory thru web interface, I cannot enter them using FTP client (TC tell me Internal server error).

comment:2 Changed at 2010-06-17T20:42:22Z by slush

  • Description modified (diff)

comment:3 Changed at 2010-06-17T21:40:25Z by davidsarah

  • Keywords docs added
  • Summary changed from Path could not be decoded as UTF-8 to SFTP and FTP: Path could not be decoded as UTF-8

This error will occur if the file or directory name is not valid UTF-8. Polish systems often use ISO-Latin-2 locales -- is that what your filesystem uses?

If so, the SFTP specification implies that it is the responsibility of the client to convert names to UTF-8, and apparently the clients you tried aren't doing that.

Alternatives:

  • have an option in tahoe.cfg or the FTP/SFTP accounts file to specify another encoding.
  • declare this to be a client bug.

The FTP frontend does not support Unicode at all; that is ticket #682. However, if #682 were fixed by implementing RFC 2640, then the FTP frontend would also only support UTF-8.

We can also improve the error message for clients that display it.

(It is a bug in Total Commander that it doesn't display the message. Also, its description of FX_FAILURE as "Internal server error" is inaccurate -- it just means an error that has no more specific code. In this case we could arguably report FX_BAD_MESSAGE, though.)

Version 0, edited at 2010-06-17T21:40:25Z by davidsarah (next)

comment:4 Changed at 2010-06-17T21:45:11Z by davidsarah

wiki:SftpFrontend#Unicodefilenames already documented this problem, but I added a reference to this ticket.

Last edited at 2010-06-17T21:51:18Z by davidsarah (previous) (diff)

comment:5 in reply to: ↑ 1 Changed at 2010-06-17T21:47:24Z by davidsarah

Replying to slush:

And when I create directory thru web interface, I cannot enter them using FTP client (TC tell me Internal server error).

This problem (listing a directory containing non-ASCII names in FTP) is part of #682.

Last edited at 2010-06-17T21:55:08Z by davidsarah (previous) (diff)

comment:6 follow-up: Changed at 2010-06-20T23:24:02Z by slush

@David:

You are right, I fixed problems with UTF-8 in WinSCP by setting correct charset. I think it should be mentioned in SFTP frontend docs (because many other SFTP servers Im using works without any settings changes).

comment:7 Changed at 2010-06-21T01:55:08Z by davidsarah

  • Keywords ftpd added; ftp removed
  • Milestone changed from undecided to soon
  • Owner set to davidsarah
  • Status changed from new to assigned

comment:8 in reply to: ↑ 6 ; follow-up: Changed at 2010-06-21T01:57:02Z by davidsarah

  • Owner changed from davidsarah to slush
  • Status changed from assigned to new

Replying to slush:

@David:

You are right, I fixed problems with UTF-8 in WinSCP by setting correct charset.

Please change the 'Unicode filenames' section of wiki:SftpFrontend to explain how to do this.

comment:9 Changed at 2010-06-21T03:11:01Z by davidsarah

  • Keywords names added

comment:10 Changed at 2010-06-21T20:59:57Z by zooko

  • Version changed from 1.7β to 1.7.0

comment:11 in reply to: ↑ 8 Changed at 2010-07-11T22:07:40Z by davidsarah

  • Milestone changed from soon to eventually
  • Owner changed from slush to nobody

Replying to davidsarah:

Replying to slush:

You are right, I fixed problems with UTF-8 in WinSCP by setting correct charset.

Please change the 'Unicode filenames' section of wiki:SftpFrontend to explain how to do this.

Done.

comment:12 Changed at 2011-02-03T00:02:21Z by davidsarah

  • Priority changed from major to minor
  • Summary changed from SFTP and FTP: Path could not be decoded as UTF-8 to SFTP and FTP: support for non-UTF-8 charsets (error message "Path could not be decoded as UTF-8")

I'm unconvinced that supporting non-UTF-8 encodings is worth the hassle and complexity. Does anyone want to argue in favour of it?

Note that neither SFTP nor FTP have any standard by which a specific non-UTF-8 encoding could be automatically negotiated. So, this would have to be manually configured. But clients that do a reasonable job of supporting non-ASCII characters at all, usually have an option to select UTF-8. So I think that support for other encodings would benefit very few users.

comment:13 Changed at 2013-07-27T12:54:22Z by amontero

  • Cc amontero@… added
  • Description modified (diff)

comment:14 Changed at 2013-08-02T03:06:46Z by daira

  • Resolution set to wontfix
  • Status changed from new to closed

I'm wontfixing this, because SFTP and FTP simply have no support for negotiating non-UTF-8 encodings.

Note: See TracTickets for help on using tickets.