[tahoe-dev] Unicode issues review
Shawn Willden
shawn-tahoe at willden.org
Tue Feb 17 10:52:10 PST 2009
On Tuesday 17 February 2009 10:31:35 am zooko wrote:
> Ugh -- you mean to tell me that the filesystem itself might not know
> what encoding a filename is in?
Yep!
On most (all?) Unix-style systems, locale is an environment setting with a
system-wide default, but can be overridden per-user (or even per-shell). The
file system doesn't know anything about the encoding used, it just coughs up
the bytes and relies on higher layers to make sense of them, per the current
locale.
There's also no enforcement, by any layer, really, that any of the file
systems make sense in the current locale, or any other.
Even if the file system did know the encoding on a per-name basis, there's
still no guarantee that other names won't slip in, because there are plenty
of ways files can be transferred by tools that don't worry about name
encodings.
> In that case, examining the
> directory with "ls" or a gooey file browser would show gibberish,
> right?
See the attached screenshot. This is from my machine. The name would be
meaningless to me even if I knew what the encoding was, because it's Korean.
The content, however, is quite useful to me, so I'm just happy that my system
and application software lets me open and use it, even with the bizarre name.
Maybe I should rename it, but it's from a CVS repository and I don't want it
to show up on the "cvs update" list.
These kinds of things are common when you work with people from around the
world. Mostly everyone tries to stick to English, but stuff slips through.
Shawn.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: snapshot1.png
Type: image/png
Size: 38827 bytes
Desc: not available
Url : http://allmydata.org/pipermail/tahoe-dev/attachments/20090217/c212446f/attachment-0001.png
More information about the tahoe-dev
mailing list