[tahoe-dev] Unicode issues review

Jan-Benedict Glaw jbglaw at lug-owl.de
Tue Feb 17 10:03:01 PST 2009

On Tue, 2009-02-17 10:31:35 -0700, zooko <zooko at zooko.com> wrote:
> On Feb 17, 2009, at 9:15 AM, Shawn Willden wrote:
> > The problem with that is that there isn't necessarily any one  
> > encoding that works.  It would be nice if all the file names in a  
> > file system used the same encoding, but it isn't necessarily true.
> Ugh -- you mean to tell me that the filesystem itself might not know  
> what encoding a filename is in?  In that case, examining the  
> directory with "ls" or a gooey file browser would show gibberish,  
> right?  Unless "ls" or the gooey file browser is *guessing* what  
> encoding this particular sequence of bytes is probably in.
> Do any real systems do this?

This happens. Quite often actually, at least if the system is used by
several people coming from all around the world.

They log in over SSH, use their very own encoding within their
$HOME, but not neccessarily the same.  So if one user uses `ls' in
another user's homedir, he might see some gibberish. So technically,
"encoding" is a per-file property on some filesystems (those that
don't care about a filename's contents, as long as it doesn't contain
the directory delimiter (typically '/' or '\\') or the '\0' (end of
string)). Other filesystems have a per-filesystem encoding and won't
allow certain filenames which would be considered illegal in this
encoding. NTFS is one of those.

That is, there should be two different things we keep in mind. For
one, there is the file's name, which might make sense to internally
store in UTF-8. And for another thing, there are the user's terminal
settings (which usually are the same as the encoding for *his* file
names), that we might need for displaying names (outside the web
browser, eg. within fuse or maybe even ftp/scp) and local file I/O.


      Jan-Benedict Glaw      jbglaw at lug-owl.de              +49-172-7608481
Signature of:                 Friends are relatives you make for yourself.
the second  :
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://allmydata.org/pipermail/tahoe-dev/attachments/20090217/b7096bd3/attachment.pgp 

More information about the tahoe-dev mailing list