[tahoe-dev] [tahoe-lafs] #534: "tahoe cp" command encoding issue
Shawn Willden
shawn-tahoe at willden.org
Sun May 3 10:48:07 PDT 2009
On Sunday 03 May 2009 09:14:28 am tahoe-lafs wrote:
> 2. On Linux or Solaris read the filename with the string APIs, and
> store the result in the "original_bytes" part of the metadata. Call
> sys.getfilesystemencoding() to get an alleged_encoding. Then, call
> bytes.decode(alleged_encoding, 'strict') to try to get a unicode
> object.
Why not just read the filename with the unicode API? That will decode it
using the file system encoding if possible, and if that decoding fails you'll
get a string object as a result, with the original bytes. Then you only have
to bother with the "original_bytes", "failed_decode", etc. if the file name
is a string, rather than a unicode object.
This allows Windows and Unix to use the same code, except that on Windows all
of the code to handle an unsuccessful decoding is never exercised, because it
can't happen.
If there is value in using the string API and then decoding in scrict mode,
then your approach makes perfect sense, except that I'd still prefer to
handle it the same way on all platforms, rather than special-casing. Reading
with the string API and then strictly decoding with the file system encoding
should work just fine on Windows, too.
Or am I missing something obvious? (Wouldn't be the first time, won't be the
last!)
Shawn.
More information about the tahoe-dev
mailing list