[tahoe-dev] [tahoe-lafs] #534: "tahoe cp" command encoding issue

Shawn Willden shawn-tahoe at willden.org
Sun May 3 10:48:07 PDT 2009


On Sunday 03 May 2009 09:14:28 am tahoe-lafs wrote:
>  2. On Linux or Solaris read the filename with the string APIs, and
>  store the result in the "original_bytes" part of the metadata. Call
>  sys.getfilesystemencoding() to get an alleged_encoding. Then, call
>  bytes.decode(alleged_encoding, 'strict') to try to get a unicode
>  object.

Why not just read the filename with the unicode API?  That will decode it 
using the file system encoding if possible, and if that decoding fails you'll 
get a string object as a result, with the original bytes.  Then you only have 
to bother with the "original_bytes", "failed_decode", etc. if the file name 
is a string, rather than a unicode object.

This allows Windows and Unix to use the same code, except that on Windows all 
of the code to handle an unsuccessful decoding is never exercised, because it 
can't happen.

If there is value in using the string API and then decoding in scrict mode, 
then your approach makes perfect sense, except that I'd still prefer to 
handle it the same way on all platforms, rather than special-casing.  Reading 
with the string API and then strictly decoding with the file system encoding 
should work just fine on Windows, too.

Or am I missing something obvious? (Wouldn't be the first time, won't be the 
last!)

	Shawn.



More information about the tahoe-dev mailing list