Opened at 2009-04-17T13:32:35Z
Last modified at 2014-12-02T19:41:19Z
#682 assigned defect
FTP frontend should support Unicode filenames encoded as UTF-8
Reported by: | arthur | Owned by: | francois |
---|---|---|---|
Priority: | major | Milestone: | soon |
Component: | code-frontend-ftp-sftp | Version: | 1.3.0 |
Keywords: | i18n unicode ftpd names twisted | Cc: | amontero@… |
Launchpad Bug: |
Description (last modified by amontero)
using ncftp on a put of a file with an é accent I get the following message :
[Requested action not taken: internal server error]
in the logs server side :
2009-04-17 15:22:07+0200 [ProtocolWrapper,3,127.0.0.1] Unhandled Error Traceback (most recent call last): File "/usr/lib/python2.5/site-packages/twisted/internet/tcp.py", line 362, in doRead return self.protocol.dataReceived(data) File "/usr/lib/python2.5/site-packages/twisted/protocols/policies.py", line 72, in dataReceived self.wrappedProtocol.dataReceived(data) File "/usr/lib/python2.5/site-packages/twisted/protocols/basic.py", line 231, in dataReceived why = self.lineReceived(line) File "/usr/lib/python2.5/site-packages/twisted/protocols/ftp.py", line 698, in lineReceived d = defer.maybeDeferred(self.processCommand, cmd, *args) --- <exception caught here> --- File "/usr/lib/python2.5/site-packages/twisted/internet/defer.py", line 106, in maybeDeferred result = f(*args, **kw) File "/usr/lib/python2.5/site-packages/twisted/protocols/ftp.py", line 729, in processCommand return method(*params) File "/usr/lib/python2.5/site-packages/twisted/protocols/ftp.py", line 1079, in ftp_STOR d = self.shell.openForWriting(newsegs) File "/usr/lib/python2.5/site-packages/allmydata/frontends/ftpd.py", line 255, in openForWriting path = [unicode(p) for p in path] exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 21: ordinal not in range(128)
Change History (26)
comment:1 Changed at 2009-04-23T10:05:58Z by francois
- Milestone changed from undecided to 1.5.0
- Owner changed from nobody to francois
comment:2 Changed at 2009-04-23T10:06:02Z by francois
- Status changed from new to assigned
comment:3 Changed at 2009-06-30T17:16:35Z by zooko
- Milestone changed from 1.5.0 to eventually
comment:4 Changed at 2009-07-11T11:28:04Z by warner
- Component changed from unknown to code-frontend
- Description modified (diff)
reformatted description slightly
comment:5 Changed at 2009-11-23T02:37:56Z by davidsarah
- Keywords i18n unicode added
comment:6 Changed at 2010-01-15T02:39:15Z by davidsarah
- Keywords ftp added
comment:7 follow-up: ↓ 9 Changed at 2010-01-15T02:41:01Z by davidsarah
See RFC 2640 for FTP internationalization.
comment:8 Changed at 2010-02-07T16:43:48Z by davidsarah
- Keywords ftpd added; ftp removed
comment:9 in reply to: ↑ 7 ; follow-up: ↓ 16 Changed at 2010-06-15T23:41:39Z by davidsarah
Replying to davidsarah:
See RFC 2640 for FTP internationalization.
Summary:
- include UTF8 in the response to a FEAT request;
- use UTF-8;
- reject filenames that are not valid UTF-8.
Admirably simple :-)
(See also #1076 about normalization, but that will probably be done in the dirnode interface rather than in frontends.)
comment:10 Changed at 2010-06-15T23:50:30Z by davidsarah
Hmm, judging by the exception message ("'ascii' codec can't decode byte 0xe0"), ncftp was trying to use ISO-Latin-1 rather than UTF-8. But at least it would be possible for clients to do the right thing, so I still think we should implement RFC 2640.
comment:11 Changed at 2010-06-15T23:57:51Z by davidsarah
Actually 'é' is 0xE9 in ISO-Latin-1, so I don't know what encoding this was (but not UTF-8).
comment:12 Changed at 2010-06-15T23:59:14Z by davidsarah
- Description modified (diff)
comment:13 Changed at 2010-06-16T00:04:51Z by davidsarah
- Summary changed from FTP frontend refuses accents to FTP frontend should support Unicode filenames
comment:14 Changed at 2010-06-16T16:45:50Z by zooko
With the new improved pyutil-1.7.9 you get this handy-dandy script called "try_decoding":
HACL:~/playground/pyutil/bothw$ python -c 'open("d","wb").write(chr(0xe0))' HACL:~/playground/pyutil/bothw$ try_decoding d -t é HACL:~/playground/pyutil/bothw$
Oh hey there are no encodings known to Python 2.6.1 which would decode 0xe0 to é!
Here are all the things that all the encodings would decode 0xe0 to:
HACL Zooko-Ofsimplegeos-MacBook-Pro:~/playground/pyutil/bothw$ try_decoding d charmap : à cp037 : \ cp1006 : ﻓ cp1026 : ü cp1140 : \ cp1250 : ŕ cp1251 : а cp1252 : à cp1253 : ΰ cp1254 : à cp1255 : א cp1256 : à cp1257 : ą cp1258 : à cp424 : \ cp437 : α cp500 : \ cp737 : ω cp775 : Ó cp850 : Ó cp852 : Ó cp855 : Я cp857 : Ó cp860 : α cp861 : α cp862 : α cp863 : α cp864 : ـ cp865 : α cp866 : р cp869 : ζ cp874 : เ cp875 : \ hp_roman8 : Á iso8859_1 : à iso8859_10 : ā iso8859_11 : เ iso8859_13 : ą iso8859_14 : à iso8859_15 : à iso8859_16 : à iso8859_2 : ŕ iso8859_3 : à iso8859_4 : ā iso8859_5 : р iso8859_6 : ـ iso8859_7 : ΰ iso8859_8 : א iso8859_9 : à koi8_r : Ю koi8_u : Ю latin_1 : à mac_arabic : ـ mac_centeuro : ŗ mac_croatian : – mac_cyrillic : а mac_farsi : ـ mac_greek : ύ mac_iceland : ý mac_latin2 : ŗ mac_roman : ‡ mac_romanian : ‡ mac_turkish : ‡ palmos : à ptcp154 : а raw_unicode_escape : à rot_13 : à tis_620 : เ unicode_escape : à
comment:15 Changed at 2010-06-17T21:56:18Z by davidsarah
#1089 discusses the use of non-UTF-8 encodings by FTP and SFTP clients.
comment:16 in reply to: ↑ 9 Changed at 2010-06-21T01:30:51Z by davidsarah
Replying to davidsarah:
Summary:
- include UTF8 in the response to a FEAT request; [...]
Twisted's FTP implementation does not currently implement FEAT. However it is implemented in such a way that it's relatively easy to monkey-patch it to do so, and no more ugly than monkey-patching always is. Something like (untested):
def ftp_FEAT(self, arg=None): if not (hasattr(self, 'shell') and hasattr(self.shell, 'feat') and hasattr(self, 'sendLine')): log.msg("Assumption needed to monkey-patch FEAT support in Twisted " "does not hold", level=log.WEIRD) return defer.fail(ftp.CmdNotImplementedError('FEAT')) if arg is not None: return defer.fail(ftp.CmdSyntaxError('FEAT does not take any argument')) d = defer.maybeDeferred(self.shell.feat) def _reply(features): self.sendLine('211- Featuretastic!') for f in features: self.sendLine(' ' + f) return ftp.SYS_STATUS_OR_HELP_REPLY d.addCallback(_reply) return d if not hasattr(ftp.FTP, 'ftp_FEAT'): ftp.FTP.ftp_FEAT = ftp_FEAT class Handler... def feat(self): if self.encoding_is_utf8(): return ['UTF8'] else: return []
comment:17 Changed at 2010-06-21T01:52:26Z by davidsarah
- Milestone changed from eventually to soon
comment:18 Changed at 2010-06-21T03:14:26Z by davidsarah
- Keywords names added
comment:19 Changed at 2010-06-21T21:17:32Z by zooko
I opened http://twistedmatrix.com/trac/ticket/4515 (support the FTP FEAT request).
comment:20 Changed at 2010-06-21T21:17:43Z by zooko
- Keywords twisted added
comment:21 Changed at 2011-02-02T23:50:46Z by davidsarah
- Summary changed from FTP frontend should support Unicode filenames to FTP frontend should support Unicode filenames encoded as UTF-8
comment:22 Changed at 2012-12-28T06:32:01Z by zooko
Twisted #4515 has been closed.
comment:23 Changed at 2012-12-28T23:52:45Z by davidsarah
Unfortunately the fix for that ticket isn't sufficient, because
adiroiban wrote in http://twistedmatrix.com/trac/ticket/4515#comment:13:
I don't plan to add IFTPShell.FEATURES in this patch since without UTF-8 support there will be nothing to export. Beside UTF-8 all other features (SIZE, MDTM, ect) are tied to the protocol.FTP implementation.
With this change, it is possible to declare support for UTF-8 by monkeypatching twisted.protocols.ftp.FTP.FEATURES, but that depends on an implementation detail, which is what we were trying to avoid. (Granted, it's a slightly less ugly monkeypatch.)
I don't know why adiroiban ignored me when I pointed out that the goal of that ticket could be achieved in a simpler way that would have been sufficient. Maybe I should have argued the case more strenuously.
comment:24 Changed at 2012-12-29T00:01:34Z by davidsarah
Sigh, and it doesn't have a conformant implementation of OPTS:
def ftp_OPTS(self, option): """ Handle OPTS command. http://tools.ietf.org/html/draft-ietf-ftpext-utf-8-option-00 """ return self.reply(OPTS_NOT_IMPLEMENTED, option)
http://tools.ietf.org/html/draft-ietf-ftpext-utf-8-option-00 says:
2. UTF-8 Option The user issues the OPTS UTF-8 command to indicate its willingness to send and receive UTF-8 encoded pathnames over the control connection. Prior to sending this command, the user should not transmit UTF-8 encoded pathnames.
comment:25 Changed at 2013-07-27T12:56:12Z by amontero
- Cc amontero@… added
- Description modified (diff)
comment:26 Changed at 2014-12-02T19:41:19Z by warner
- Component changed from code-frontend to code-frontend-ftp-sftp
This is definitely the same sort of encoding issues as in #534. I'll try to have a look at it.