[tahoe-dev] #534: "tahoe cp" command encoding issue
zooko
zooko at zooko.com
Fri Feb 27 19:27:03 PST 2009
Folks:
Regarding my Strategy 2.d [1], François's Strategy 2.d&1/2 [2], and
Alberto's Strategy 2.e [3], the question is what is more desirable
for the case that there is a filename in a local filesystem which
isn't actually a valid encoding in that filesystem's default codec,
and that file gets "tahoe backup"'ed or "tahoe cp"''ed into a tahoe
directory, and *then* an old or lazy tahoe client reads that filename
out of a tahoe directory and gives it to you. Do you want this old
or lazy tahoe client to give you:
2.d: Whatever that filename would have been if it had actually been
encoded in latin-1 in the first place. (I.e., some sort of
gibberish, if it wasn't actually latin-1.)
2.d&1/2: The same as 2.d, but prepended with the the U+FFFC char
2.e: Whichever characters of that filename *are* legitimate for the
filesystem's default codec, interspersed with U+FFFD "replacement
characters" for any characters that aren't legitimate for the default
codec.
I tend to think that the first of those three options is the best,
but I would defer to any established "best practices" among unicode
gurus. Remember that we're only talking about backwards-
compatibility here -- the behavior of old tahoe clients who don't
know how to do anything but treat the "child name" as a unicode
string. Also lazy tahoe clients who don't bother to check for this
condition and get the original bytes and do "Whatever it is that
diligent clients are supposed to do with a bunch of bytes in some
unknown encoding.".
Regards,
Zooko
[1] http://allmydata.org/pipermail/tahoe-dev/2009-February/001343.html
[2] http://allmydata.org/pipermail/tahoe-dev/2009-February/001346.html
[3] http://allmydata.org/pipermail/tahoe-dev/2009-February/001348.html
[4] http://en.wikipedia.org/wiki/Replacement_character
More information about the tahoe-dev
mailing list