[tahoe-dev] #534: "tahoe cp" command encoding issue
Brian Warner
warner at lothar.com
Fri Feb 27 09:45:24 PST 2009
[must be brief, typing on an iphone, I'll write more on Monday when
I've got a real keyboard]
One limitation to keep in mind is that JSON cannot represent arbitrary
binary data without application-visible encoding, and that both the
webapi GET $dircap?t=json and the dirnode-format metadata dict use
JSON. So any "store the original bytes and let the reader sort it out"
approach must e.g. base32-encode those bytes on the way in and base32-
decode them on the way out, in the CLI tool on the user side of the
HTTP connection.
How about this: we treat the child name (which has more users right
now, in terms of lines of code which think they know how to interpret
it) as being the "share with others" name: always unicode, but not
always a faithful roundtrippable representation of the original. Then,
for files which were copies from a local disk (like with "tahoe cp" or
"tahoe backup", as opposed to a WUI operation), let's add a metadata
field that is defined to hold the base32-encoded representation of the
original uninterpreted filename bytestring, and treat this metadata
field as the "note to myself" value, used to restore from a backup but
not meant for other users.
On the inbound side, if we can't decode the filename with the user's
preferred encoding (which can default to utf-8, or utf-16 on windows,
or something configured into python, etc), then we pretend to decode
it with Latin-1, so that a human looking at the mangled unicode name
can hopefully guess what the proper name should have been. We use the
unicode result as the childname. In all cases, we store the orginal
bytestring in the metadata.
Then, on the outbound side, we add a --use-original-binary-filename
option, which tells "tahoe cp" to ignore the unicode name and just use
the bytestring from the metadata. Normally, we have it encode the
unicode childname into the preferred charset (again with some
defaults) and ignore the metadata.
Thoughts?
-Brian
More information about the tahoe-dev
mailing list