[tahoe-dev] String encoding in tahoe
zooko
zooko at zooko.com
Tue Dec 23 14:36:25 PST 2008
Dear François:
What you write sounds reasonable, but I'm not sure precisely how it
would be implemented. We continue to run .decode('utf-8') on
incoming strings, allowing an exception to stop the Python
interpreter if the input can't be utf-8 decoded? The only worry with
that is that it is possible that the input accidentally matches a
utf-8 pattern, so it thinks that it decoded it successfully, but it
got a random gibberish string instead of the intended string.
The other thing that concerns me is that more of the buildbots were
green before our recent patches. Does this mean that if we revert
those patches then the tahoe cli will work with non-ascii filenames
on Windows? Or does it mean that the tests were incorrectly marking
Windows as green last week but actually non-ascii filenames wouldn't
have worked on Windows?
I need to decide what to do for Tahoe-1.3.0, and what we had last
week -- where everything passed your tests except for Ubuntu Feisty
-- seems preferable to what we have today.
If you could tell me precisely what does/doesn't work on what
platforms, then we could write it down in the known_issues.txt file,
and we could put SKIP or TODO marks on the appropriate unit tests so
that the buildbot is green. I would be very grateful for any help on
diagnosing and documenting the unicode situation for the 1.3.0
release. Allmydata.com doesn't use the cli on Windows currently, so
the company probably isn't going to spend too much resources on that
particular feature.
Oh, and another idea would be to override the sys.setdefaultencoding
to be utf-8 instead of ascii. Would that be a good idea?
Regards,
Zooko
More information about the tahoe-dev
mailing list