#1224 closed defect (fixed)
Unicode bug in grid to grid copies
Reported by: | francois | Owned by: | warner |
---|---|---|---|
Priority: | major | Milestone: | 1.8.1 |
Component: | code-frontend-cli | Version: | 1.8.0 |
Keywords: | unicode tahoe-cp news-done | Cc: | |
Launchpad Bug: |
Description
A grid to grid copy involving non-ASCII filenames fails. This is likely another occurrence of bug #534.
$ tahoe cp -rv tahoe:Blah tahoe:Blah2
/usr/lib/python2.5/urllib.py:1205: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal res = map(safe_map.__getitem__, s) Traceback (most recent call last): File "/root/tahoe-lafs/support/bin/tahoe", line 9, in <module> load_entry_point('allmydata-tahoe==1.8.0-r4751', 'console_scripts', 'tahoe')() File "/root/tahoe-lafs/src/allmydata/scripts/runner.py", line 118, in run rc = runner(sys.argv[1:], install_node_control=install_node_control) File "/root/tahoe-lafs/src/allmydata/scripts/runner.py", line 104, in runner rc = cli.dispatch[command](so) File "/root/tahoe-lafs/src/allmydata/scripts/cli.py", line 493, in cp rc = tahoe_cp.copy(options) File "/root/tahoe-lafs/src/allmydata/scripts/tahoe_cp.py", line 762, in copy return Copier().do_copy(options) File "/root/tahoe-lafs/src/allmydata/scripts/tahoe_cp.py", line 442, in do_copy status = self.try_copy() File "/root/tahoe-lafs/src/allmydata/scripts/tahoe_cp.py", line 485, in try_copy return self.copy_to_directory(sources, target) File "/root/tahoe-lafs/src/allmydata/scripts/tahoe_cp.py", line 649, in copy_to_directory self.assign_targets(source, target) File "/root/tahoe-lafs/src/allmydata/scripts/tahoe_cp.py", line 684, in assign_targets subtarget = target.get_child_target(name) File "/root/tahoe-lafs/src/allmydata/scripts/tahoe_cp.py", line 378, in get_child_target writecap = make_tahoe_subdirectory(self.nodeurl, self.writecap, name) File "/root/tahoe-lafs/src/allmydata/scripts/tahoe_cp.py", line 55, in make_tahoe_subdirectory ]) + "?t=mkdir" File "/usr/lib/python2.5/urllib.py", line 1205, in quote res = map(safe_map.__getitem__, s) KeyError: u'\xe9'
Change History (13)
comment:1 Changed at 2010-10-14T01:40:50Z by davidsarah
comment:2 Changed at 2010-10-14T06:30:52Z by zooko
This isn't actually a regression from v1.7.1 to v1.8.0 is it?
(Maybe we should fix it in v1.8.1 anyway, just because it is easy to fix, impacts actual users like François, the fix is unlikely to cause other problems, and it is "unfinished business" from the new univode support in v1.7.0.)
comment:3 Changed at 2010-10-16T01:01:05Z by francois
- Keywords review-needed added
- Status changed from new to assigned
A patch to fix this bug and add a test has been pushed in my git repository which is available there:
comment:4 follow-up: ↓ 5 Changed at 2010-10-16T04:18:42Z by davidsarah
There are other instances of urllib.quote with a name (as opposed to a cap URI) as argument, in tahoe_backup.py, tahoe_mkdir.py, tahoe_put.py, and web/directory.py I think.
comment:5 in reply to: ↑ 4 Changed at 2010-10-16T09:34:50Z by francois
Replying to davidsarah:
There are other instances of urllib.quote with a name (as opposed to a cap URI) as argument, in tahoe_backup.py, tahoe_mkdir.py, tahoe_put.py, and web/directory.py I think.
I already did a grep in the whole tree to find other occurrences of this bug, here's what I came up with.
- tahoe_backup.py
Function put_child gets only called with path="Latest" or path=now which are both ASCII strings. But you're right, this is probably safer to use unicode_to_url there as well. I pushed a new commit in my git branch with this change.
- tahoe_mkdir.py
The path variable comes from the get_alias function which already returns an UTF-8 encoded string.
def get_alias(aliases, path_unicode, default): """ Transform u"work:path/filename" into (aliases[u"work"], u"path/filename".encode('utf-8')).
- tahoe_put.py
It uses the get_alias function as well.
- web/directory.py
In this file, the name is always encoded as an UTF-8 string before use.
name = name.encode("utf-8")
comment:6 Changed at 2010-10-16T15:49:50Z by davidsarah
- Keywords reviewed added; review-needed removed
I reviewed the git commit and it looks good.
comment:7 Changed at 2010-10-21T15:49:02Z by zooko
- Owner changed from francois to warner
- Status changed from assigned to new
Brian, could you merge this patch into trunk and push it into the darcs repo at dev.allmydata.org:/home/darcs/tahoe-lafs/trunk? Thanks!
comment:8 Changed at 2010-10-23T04:29:01Z by davidsarah
I reviewed the change to tahoe_backup.py and that also looks good.
comment:9 Changed at 2010-10-28T06:13:04Z by zooko
Okay, Brian could you also push that one from comment:8 into trunk then? :-)
Oh, do these need a NEWS entry?
comment:10 Changed at 2010-10-28T18:04:16Z by davidsarah
- Keywords news-needed added
comment:11 Changed at 2010-10-29T09:09:38Z by Brian Warner <warner@…>
- Resolution set to fixed
- Status changed from new to closed
In 14ee763c542b61c5:
comment:12 Changed at 2010-10-29T19:43:14Z by david-sarah@…
In 2610f8e0aa6e2221:
comment:13 Changed at 2010-10-29T19:51:42Z by davidsarah
- Keywords news-done added; reviewed news-needed removed
I had assumed that urllib.quote was supposed to UTF-8-then-percent-encode Unicode strings, but it's not documented as doing so, so that was probably wishful thinking.
This seems to be http://bugs.python.org/issue1712522. Apparently you have to convert to UTF-8 manually.
Note that we have a unicode_to_url method in src/allmydata/util/encodingutil.py that should probably be used for this (or maybe we should add a quote_unicode_url method, if it turns out that we normally need to convert and percent-escape at the same time).