[tahoe-dev] String encoding in tahoe

Dan McNair glucnac at gmail.com
Tue Dec 23 14:09:25 PST 2008


A collection of more or less random thoughts follows.

I think that ignoring the encoding issue will work better more of the time
than assuming utf-8 is the encoding.

Ignoring encoding will only break if what the user passes in on the command
line is unsupported by the filesystem. This is more like a user error than
an application error. Our responsibility should be limited to gracefully
alerting the user to the problem, as opposed to dying with a cryptic
exception.

FWIW: The current 'default' encoding and 'filesystem' encoding can both be
queried in the sys module. Need to confirm that '/' isn't munged up in
encoding?

    assert u'/'.encode(sys.getfilesystemencoding()) == '/'

Adding CLI options to control encoding/decoding would be useful for power
users. Otherwise I think it should be left alone. I can't even dream up a
situation in which having options would help.

Curious: does Tahoe support arbitrary binary strings as filenames in the
backend, or only accept certain encodings? HTTP certainly supports arbitrary
byte sequences, ugly though it may be. I don't recall anything from my scan
of the DIR2 documentation that would cause problems with filenames in
arbitrary encoding(s).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://allmydata.org/pipermail/tahoe-dev/attachments/20081223/15181d6b/attachment.htm 


More information about the tahoe-dev mailing list