[tahoe-dev] Unicode issues review
Shawn Willden
shawn-tahoe at willden.org
Tue Feb 17 06:48:55 PST 2009
On Tuesday 17 February 2009 05:11:10 am Francois Deppierraz wrote:
> The main limitation is that only systems using an UTF-8 filesystem and
> UTF-8 command line arguments are currently supported. However, it should
> be easy to support other encoding as soon as we have a way to detect it.
Where possible, I think it's probably better to avoid making any assumptions.
My file system is UTF-8, but I do some work with a development team in Korea,
and some of the files checked out from their repository are in some other
encoding (I'm not even sure what it is).
The approach I've taken for my backup tool is to treat filenames as opaque
bytestrings, and to apply escape-encoding wherever necessary to ensure that I
can take, store, retrieve and restore ANY file name, regardless of encoding,
exactly as it was.
This has the downside of assuming that files will be retrieved by a system
with the same encoding as the system that stored them. That's a reaonable
assumption, IMO, for a backup tool like mine, or 'tahoe backup' Not so much
for a general-purpose DFS, I suppose.
Shawn.
More information about the tahoe-dev
mailing list