#682 assigned defect

FTP frontend should support Unicode filenames encoded as UTF-8

Reported by: arthur Owned by: francois
Priority: major Milestone: soon
Component: code-frontend-ftp-sftp Version: 1.3.0
Keywords: i18n unicode ftpd names twisted Cc: amontero@…
Launchpad Bug:

Description (last modified by amontero)

using ncftp on a put of a file with an é accent I get the following message :

[Requested action not taken: internal server error]

in the logs server side :

2009-04-17 15:22:07+0200 [ProtocolWrapper,3,127.0.0.1] Unhandled Error
        Traceback (most recent call last):
          File "/usr/lib/python2.5/site-packages/twisted/internet/tcp.py", line 362, in doRead
            return self.protocol.dataReceived(data)
          File "/usr/lib/python2.5/site-packages/twisted/protocols/policies.py", line 72, in dataReceived
            self.wrappedProtocol.dataReceived(data)
          File "/usr/lib/python2.5/site-packages/twisted/protocols/basic.py", line 231, in dataReceived
            why = self.lineReceived(line)
          File "/usr/lib/python2.5/site-packages/twisted/protocols/ftp.py", line 698, in lineReceived
            d = defer.maybeDeferred(self.processCommand, cmd, *args)
        --- <exception caught here> ---
          File "/usr/lib/python2.5/site-packages/twisted/internet/defer.py", line 106, in maybeDeferred
            result = f(*args, **kw)
          File "/usr/lib/python2.5/site-packages/twisted/protocols/ftp.py", line 729, in processCommand
            return method(*params)
          File "/usr/lib/python2.5/site-packages/twisted/protocols/ftp.py", line 1079, in ftp_STOR
            d = self.shell.openForWriting(newsegs)
          File "/usr/lib/python2.5/site-packages/allmydata/frontends/ftpd.py", line 255, in openForWriting
            path = [unicode(p) for p in path]
        exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 21: ordinal not in range(128)

Change History (26)

comment:1 Changed at 2009-04-23T10:05:58Z by francois

  • Milestone changed from undecided to 1.5.0
  • Owner changed from nobody to francois

This is definitely the same sort of encoding issues as in #534. I'll try to have a look at it.

comment:2 Changed at 2009-04-23T10:06:02Z by francois

  • Status changed from new to assigned

comment:3 Changed at 2009-06-30T17:16:35Z by zooko

  • Milestone changed from 1.5.0 to eventually

comment:4 Changed at 2009-07-11T11:28:04Z by warner

  • Component changed from unknown to code-frontend
  • Description modified (diff)

reformatted description slightly

comment:5 Changed at 2009-11-23T02:37:56Z by davidsarah

  • Keywords i18n unicode added

comment:6 Changed at 2010-01-15T02:39:15Z by davidsarah

  • Keywords ftp added

comment:7 follow-up: Changed at 2010-01-15T02:41:01Z by davidsarah

See RFC 2640 for FTP internationalization.

comment:8 Changed at 2010-02-07T16:43:48Z by davidsarah

  • Keywords ftpd added; ftp removed

comment:9 in reply to: ↑ 7 ; follow-up: Changed at 2010-06-15T23:41:39Z by davidsarah

Replying to davidsarah:

See RFC 2640 for FTP internationalization.

Summary:

  • include UTF8 in the response to a FEAT request;
  • use UTF-8;
  • reject filenames that are not valid UTF-8.

Admirably simple :-)

(See also #1076 about normalization, but that will probably be done in the dirnode interface rather than in frontends.)

comment:10 Changed at 2010-06-15T23:50:30Z by davidsarah

Hmm, judging by the exception message ("'ascii' codec can't decode byte 0xe0"), ncftp was trying to use ISO-Latin-1 rather than UTF-8. But at least it would be possible for clients to do the right thing, so I still think we should implement RFC 2640.

comment:11 Changed at 2010-06-15T23:57:51Z by davidsarah

Actually 'é' is 0xE9 in ISO-Latin-1, so I don't know what encoding this was (but not UTF-8).

comment:12 Changed at 2010-06-15T23:59:14Z by davidsarah

  • Description modified (diff)

comment:13 Changed at 2010-06-16T00:04:51Z by davidsarah

  • Summary changed from FTP frontend refuses accents to FTP frontend should support Unicode filenames

comment:14 Changed at 2010-06-16T16:45:50Z by zooko

With the new improved pyutil-1.7.9 you get this handy-dandy script called "try_decoding":

HACL:~/playground/pyutil/bothw$ python -c 'open("d","wb").write(chr(0xe0))'
HACL:~/playground/pyutil/bothw$ try_decoding d -t  é
HACL:~/playground/pyutil/bothw$ 

Oh hey there are no encodings known to Python 2.6.1 which would decode 0xe0 to é!

Here are all the things that all the encodings would decode 0xe0 to:

HACL Zooko-Ofsimplegeos-MacBook-Pro:~/playground/pyutil/bothw$ try_decoding d
            charmap : à
              cp037 : \
             cp1006 : ﻓ
             cp1026 : ü
             cp1140 : \
             cp1250 : ŕ
             cp1251 : а
             cp1252 : à
             cp1253 : ΰ
             cp1254 : à
             cp1255 : א
             cp1256 : à
             cp1257 : ą
             cp1258 : à
              cp424 : \
              cp437 : α
              cp500 : \
              cp737 : ω
              cp775 : Ó
              cp850 : Ó
              cp852 : Ó
              cp855 : Я
              cp857 : Ó
              cp860 : α
              cp861 : α
              cp862 : α
              cp863 : α
              cp864 : ـ
              cp865 : α
              cp866 : р
              cp869 : ζ
              cp874 : เ
              cp875 : \
          hp_roman8 : Á
          iso8859_1 : à
         iso8859_10 : ā
         iso8859_11 : เ
         iso8859_13 : ą
         iso8859_14 : à
         iso8859_15 : à
         iso8859_16 : à
          iso8859_2 : ŕ
          iso8859_3 : à
          iso8859_4 : ā
          iso8859_5 : р
          iso8859_6 : ـ
          iso8859_7 : ΰ
          iso8859_8 : א
          iso8859_9 : à
             koi8_r : Ю
             koi8_u : Ю
            latin_1 : à
         mac_arabic : ـ
       mac_centeuro : ŗ
       mac_croatian : –
       mac_cyrillic : а
          mac_farsi : ـ
          mac_greek : ύ
        mac_iceland : ý
         mac_latin2 : ŗ
          mac_roman : ‡
       mac_romanian : ‡
        mac_turkish : ‡
             palmos : à
            ptcp154 : а
 raw_unicode_escape : à
             rot_13 : à
            tis_620 : เ
     unicode_escape : à

comment:15 Changed at 2010-06-17T21:56:18Z by davidsarah

#1089 discusses the use of non-UTF-8 encodings by FTP and SFTP clients.

comment:16 in reply to: ↑ 9 Changed at 2010-06-21T01:30:51Z by davidsarah

Replying to davidsarah:

Summary:

  • include UTF8 in the response to a FEAT request; [...]

Twisted's FTP implementation does not currently implement FEAT. However it is implemented in such a way that it's relatively easy to monkey-patch it to do so, and no more ugly than monkey-patching always is. Something like (untested):

def ftp_FEAT(self, arg=None):
    if not (hasattr(self, 'shell') and hasattr(self.shell, 'feat') and
            hasattr(self, 'sendLine')):
        log.msg("Assumption needed to monkey-patch FEAT support in Twisted "
                "does not hold", level=log.WEIRD)
        return defer.fail(ftp.CmdNotImplementedError('FEAT'))

    if arg is not None:
        return defer.fail(ftp.CmdSyntaxError('FEAT does not take any argument'))

    d = defer.maybeDeferred(self.shell.feat)
    def _reply(features):
        self.sendLine('211- Featuretastic!')
        for f in features:
            self.sendLine(' ' + f)
        return ftp.SYS_STATUS_OR_HELP_REPLY
    d.addCallback(_reply)
    return d

if not hasattr(ftp.FTP, 'ftp_FEAT'):
    ftp.FTP.ftp_FEAT = ftp_FEAT

class Handler...
    def feat(self):
        if self.encoding_is_utf8():
            return ['UTF8']
        else:
            return []

comment:17 Changed at 2010-06-21T01:52:26Z by davidsarah

  • Milestone changed from eventually to soon

comment:18 Changed at 2010-06-21T03:14:26Z by davidsarah

  • Keywords names added

comment:19 Changed at 2010-06-21T21:17:32Z by zooko

I opened http://twistedmatrix.com/trac/ticket/4515 (support the FTP FEAT request).

comment:20 Changed at 2010-06-21T21:17:43Z by zooko

  • Keywords twisted added

comment:21 Changed at 2011-02-02T23:50:46Z by davidsarah

  • Summary changed from FTP frontend should support Unicode filenames to FTP frontend should support Unicode filenames encoded as UTF-8

comment:22 Changed at 2012-12-28T06:32:01Z by zooko

Twisted #4515 has been closed.

comment:23 Changed at 2012-12-28T23:52:45Z by davidsarah

Unfortunately the fix for that ticket isn't sufficient, because

adiroiban wrote in http://twistedmatrix.com/trac/ticket/4515#comment:13:

I don't plan to add IFTPShell.FEATURES in this patch since without UTF-8 support there will be nothing to export. Beside UTF-8 all other features (SIZE, MDTM, ect) are tied to the protocol.FTP implementation.

With this change, it is possible to declare support for UTF-8 by monkeypatching twisted.protocols.ftp.FTP.FEATURES, but that depends on an implementation detail, which is what we were trying to avoid. (Granted, it's a slightly less ugly monkeypatch.)

I don't know why adiroiban ignored me when I pointed out that the goal of that ticket could be achieved in a simpler way that would have been sufficient. Maybe I should have argued the case more strenuously.

comment:24 Changed at 2012-12-29T00:01:34Z by davidsarah

Sigh, and it doesn't have a conformant implementation of OPTS:

def ftp_OPTS(self, option):
    """
    Handle OPTS command.

    http://tools.ietf.org/html/draft-ietf-ftpext-utf-8-option-00
    """
    return self.reply(OPTS_NOT_IMPLEMENTED, option)

http://tools.ietf.org/html/draft-ietf-ftpext-utf-8-option-00 says:

2. UTF-8 Option

   The user issues the OPTS UTF-8 command to indicate its willingness to
   send and receive UTF-8 encoded pathnames over the control connection.
   Prior to sending this command, the user should not transmit UTF-8
   encoded pathnames.

comment:25 Changed at 2013-07-27T12:56:12Z by amontero

  • Cc amontero@… added
  • Description modified (diff)

comment:26 Changed at 2014-12-02T19:41:19Z by warner

  • Component changed from code-frontend to code-frontend-ftp-sftp
Note: See TracTickets for help on using tickets.