[tahoe-dev] Unicode issues review

Shawn Willden shawn-tahoe at willden.org
Tue Feb 17 06:48:55 PST 2009


On Tuesday 17 February 2009 05:11:10 am Francois Deppierraz wrote:
> The main limitation is that only systems using an UTF-8 filesystem and
> UTF-8 command line arguments are currently supported. However, it should
> be easy to support other encoding as soon as we have a way to detect it.

Where possible, I think it's probably better to avoid making any assumptions.

My file system is UTF-8, but I do some work with a development team in Korea, 
and some of the files checked out from their repository are in some other 
encoding (I'm not even sure what it is).

The approach I've taken for my backup tool is to treat filenames as opaque 
bytestrings, and to apply escape-encoding wherever necessary to ensure that I 
can take, store, retrieve and restore ANY file name, regardless of encoding, 
exactly as it was.

This has the downside of assuming that files will be retrieved by a system 
with the same encoding as the system that stored them.  That's a reaonable 
assumption, IMO, for a backup tool like mine, or 'tahoe backup'  Not so much 
for a general-purpose DFS, I suppose.

	Shawn.


More information about the tahoe-dev mailing list