[tahoe-dev] Unicode issues review

Francois Deppierraz francois at ctrlaltdel.ch
Wed Feb 18 02:40:43 PST 2009


Brian Warner wrote:

>  2: prioritize common access: use unicode in Tahoe dirnodes, try to interpret
>     local disk filenames as unicode (perhaps with user assistance), find a
>     way to deal with non-unicode characters
> 
> We've already decided to use unicode in Tahoe dirnodes, so I think we're
> committed to something along the lines of #2.

Yes, that's the choice I made.

> We could do something like this in Tahoe: ask the user to tell us how to
> interpret local-disk filename bytestrings (maybe we'll be lucky and they'll
> use the same convention on the whole disk), but if the decode fails,
> translate the bytes into the unicode reserved space. On the output end
> (basically 'tahoe cp'), look for these reserved characters in the tahoe name,
> and translate them back into high-bit characters in the local-disk name.

I like this solution.

It basically means that a filename in an unknown encoding which is saved
using the WAPI can then be restored using the exactly same unknown encoding.

unknown encoding -> Unicode -> UTF-8 -> Unicode -> unknown encoding

I'm googling a bit to find out how other projects have implemented that.

François


More information about the tahoe-dev mailing list