[tahoe-dev] [tahoe-lafs] #629: 'tahoe backup' doesn't tolerate 8-bit filenames
Shawn Willden
shawn-tahoe at willden.org
Sun May 24 13:08:17 PDT 2009
On Sunday 24 May 2009 11:03:56 am Zooko Wilcox-O'Hearn wrote:
> It sounds to me like your design will store enough information to
> enable any possible future improvement, but that the first version
> will give mojibake results if you backup from a linux system (even
> one with all filenames correctly encoded using the declared locale)
> and then restore on a mac system. Is that your intent?
No, even the first version will transcode between systems with different
encodings (though in your example, most Linux systems are UTF-8, and OS X is
UTF-8).
The key difference between this approach and most of what has been discussed
is that it defers all effort to properly decode and convert names to the
point of retrieval.
Backups ALWAYS succeed, because they don't try to do anything other than
preserve the data.
Restores will usually succeed, and when they don't we can then figure out how
to make them succeed.
The restore algorithm looks like:
1. Decode JSON and retrieve transport Unicode
2. Apply "decoded-with" encoder to recover source platform raw string.
3. Apply "platform-codec" decoder to (hopefully) obtain correct Unicode.
4. Apply destination platform encoding to get destination encoding.
For names from Windows source systems, "decoded-with" with be None, so steps 2
and 3 will be skipped. If the source and destination platforms use the same
codec, steps 3 and 4 could be skipped.
For names that are invalid, the decoding in step 3 may fail, which is an
error. Ultimately we can expend much cleverness trying to address those
errors. If that turns out to be important, fine, the data needed will be
available.
Shawn.
More information about the tahoe-dev
mailing list