#1224 closed defect (fixed)

Unicode bug in grid to grid copies

Reported by: francois Owned by: warner
Priority: major Milestone: 1.8.1
Component: code-frontend-cli Version: 1.8.0
Keywords: unicode tahoe-cp news-done Cc:
Launchpad Bug:

Description

A grid to grid copy involving non-ASCII filenames fails. This is likely another occurrence of bug #534.

$ tahoe cp -rv tahoe:Blah tahoe:Blah2
/usr/lib/python2.5/urllib.py:1205: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  res = map(safe_map.__getitem__, s)
Traceback (most recent call last):
  File "/root/tahoe-lafs/support/bin/tahoe", line 9, in <module>
    load_entry_point('allmydata-tahoe==1.8.0-r4751', 'console_scripts', 'tahoe')()
  File "/root/tahoe-lafs/src/allmydata/scripts/runner.py", line 118, in run
    rc = runner(sys.argv[1:], install_node_control=install_node_control)
  File "/root/tahoe-lafs/src/allmydata/scripts/runner.py", line 104, in runner
    rc = cli.dispatch[command](so)
  File "/root/tahoe-lafs/src/allmydata/scripts/cli.py", line 493, in cp
    rc = tahoe_cp.copy(options)
  File "/root/tahoe-lafs/src/allmydata/scripts/tahoe_cp.py", line 762, in copy
    return Copier().do_copy(options)
  File "/root/tahoe-lafs/src/allmydata/scripts/tahoe_cp.py", line 442, in do_copy
    status = self.try_copy()
  File "/root/tahoe-lafs/src/allmydata/scripts/tahoe_cp.py", line 485, in try_copy
    return self.copy_to_directory(sources, target)
  File "/root/tahoe-lafs/src/allmydata/scripts/tahoe_cp.py", line 649, in copy_to_directory
    self.assign_targets(source, target)
  File "/root/tahoe-lafs/src/allmydata/scripts/tahoe_cp.py", line 684, in assign_targets
    subtarget = target.get_child_target(name)
  File "/root/tahoe-lafs/src/allmydata/scripts/tahoe_cp.py", line 378, in get_child_target
    writecap = make_tahoe_subdirectory(self.nodeurl, self.writecap, name)
  File "/root/tahoe-lafs/src/allmydata/scripts/tahoe_cp.py", line 55, in make_tahoe_subdirectory
    ]) + "?t=mkdir"
  File "/usr/lib/python2.5/urllib.py", line 1205, in quote
    res = map(safe_map.__getitem__, s)
KeyError: u'\xe9'

Change History (13)

comment:1 Changed at 2010-10-14T01:40:50Z by davidsarah

I had assumed that urllib.quote was supposed to UTF-8-then-percent-encode Unicode strings, but it's not documented as doing so, so that was probably wishful thinking.

This seems to be http://bugs.python.org/issue1712522. Apparently you have to convert to UTF-8 manually.

Note that we have a unicode_to_url method in src/allmydata/util/encodingutil.py that should probably be used for this (or maybe we should add a quote_unicode_url method, if it turns out that we normally need to convert and percent-escape at the same time).

comment:2 Changed at 2010-10-14T06:30:52Z by zooko

This isn't actually a regression from v1.7.1 to v1.8.0 is it?

(Maybe we should fix it in v1.8.1 anyway, just because it is easy to fix, impacts actual users like François, the fix is unlikely to cause other problems, and it is "unfinished business" from the new univode support in v1.7.0.)

comment:3 Changed at 2010-10-16T01:01:05Z by francois

  • Keywords review-needed added
  • Status changed from new to assigned

A patch to fix this bug and add a test has been pushed in my git repository which is available there:

http://github.com/ctrlaltdel/tahoe-lafs/tree/ticket/1224

comment:4 follow-up: Changed at 2010-10-16T04:18:42Z by davidsarah

There are other instances of urllib.quote with a name (as opposed to a cap URI) as argument, in tahoe_backup.py, tahoe_mkdir.py, tahoe_put.py, and web/directory.py I think.

comment:5 in reply to: ↑ 4 Changed at 2010-10-16T09:34:50Z by francois

Replying to davidsarah:

There are other instances of urllib.quote with a name (as opposed to a cap URI) as argument, in tahoe_backup.py, tahoe_mkdir.py, tahoe_put.py, and web/directory.py I think.

I already did a grep in the whole tree to find other occurrences of this bug, here's what I came up with.

  1. tahoe_backup.py

Function put_child gets only called with path="Latest" or path=now which are both ASCII strings. But you're right, this is probably safer to use unicode_to_url there as well. I pushed a new commit in my git branch with this change.

  1. tahoe_mkdir.py

The path variable comes from the get_alias function which already returns an UTF-8 encoded string.

def get_alias(aliases, path_unicode, default):
    """
    Transform u"work:path/filename" into (aliases[u"work"], u"path/filename".encode('utf-8')).
  1. tahoe_put.py

It uses the get_alias function as well.

  1. web/directory.py

In this file, the name is always encoded as an UTF-8 string before use.

name = name.encode("utf-8")

comment:6 Changed at 2010-10-16T15:49:50Z by davidsarah

  • Keywords reviewed added; review-needed removed

I reviewed the git commit and it looks good.

comment:7 Changed at 2010-10-21T15:49:02Z by zooko

  • Owner changed from francois to warner
  • Status changed from assigned to new

Brian, could you merge this patch into trunk and push it into the darcs repo at dev.allmydata.org:/home/darcs/tahoe-lafs/trunk? Thanks!

comment:8 Changed at 2010-10-23T04:29:01Z by davidsarah

I reviewed the change to tahoe_backup.py and that also looks good.

comment:9 Changed at 2010-10-28T06:13:04Z by zooko

Okay, Brian could you also push that one from comment:8 into trunk then? :-)

Oh, do these need a NEWS entry?

comment:10 Changed at 2010-10-28T18:04:16Z by davidsarah

  • Keywords news-needed added

comment:11 Changed at 2010-10-29T09:09:38Z by Brian Warner <warner@…>

  • Resolution set to fixed
  • Status changed from new to closed

In 14ee763c542b61c5:

tahoe_cp.py: Don't call urllib.quote with an Unicode argument, fix #1224
tahoe_backup.py: Fix another (potential) occurrence of calling urllib.quote()
with an Unicode parameter

comment:12 Changed at 2010-10-29T19:43:14Z by david-sarah@…

In 2610f8e0aa6e2221:

NEWS: clarify (strengthen) description of what backdoors.rst declares, and add bugfix entries for 'tahoe cp' and Windows console bugs. refs #1216, #1224, #1232

comment:13 Changed at 2010-10-29T19:51:42Z by davidsarah

  • Keywords news-done added; reviewed news-needed removed
Note: See TracTickets for help on using tickets.