[tahoe-dev] Unicode issues review
Shawn Willden
shawn-tahoe at willden.org
Tue Feb 17 11:04:23 PST 2009
On Tuesday 17 February 2009 11:09:09 am zooko wrote:
> 3. If the filesystem guarantees a specific encoding, use that one,
> else if it provides a "default" encoding, then try to decode with
> that one, and if decoding fails then reject the filename and ask the
> user to fix it up.
>
> 3.b. ... and if decoding fails then treat the filename as an opaque
> blob.
I think this is the best option.
> 3.c. ... and if decoding fails then try to decode it with a few
> dozen of our favorite encodings in descending order of popularity ...
You could do this as well, with a fallback to 3.b. if none of them work. I'm
not sure how useful it is to try different encodings, though, because you're
going to end up with the first one that decodes it "successfully", rather
than the first one that produces a sensible result -- unless you ask the user
which one is right, I guess.
Maybe rather than trying a bunch of different encodings, just try:
(1) The encoding for the current locale (whether specified per-file or by the
environment.
(2) UTF-8
And if neither of those work, then treat it as an opaque blob. Perhaps toss
in UTF-16 as well. The nice thing about UTF-8 and UTF-16 is that when you
try to decode crap with them they usually fail, rather than just silently
giving you crap back. Usually :-)
Eventually the non-Unicode encodings should gradually disappear, so this
should break less and less often as time goes on.
> 4. Any other options?
I don't see any.
Shawn.
More information about the tahoe-dev
mailing list