[tahoe-dev] Unicode issues review
Francois Deppierraz
francois at ctrlaltdel.ch
Wed Feb 18 02:40:43 PST 2009
Brian Warner wrote:
> 2: prioritize common access: use unicode in Tahoe dirnodes, try to interpret
> local disk filenames as unicode (perhaps with user assistance), find a
> way to deal with non-unicode characters
>
> We've already decided to use unicode in Tahoe dirnodes, so I think we're
> committed to something along the lines of #2.
Yes, that's the choice I made.
> We could do something like this in Tahoe: ask the user to tell us how to
> interpret local-disk filename bytestrings (maybe we'll be lucky and they'll
> use the same convention on the whole disk), but if the decode fails,
> translate the bytes into the unicode reserved space. On the output end
> (basically 'tahoe cp'), look for these reserved characters in the tahoe name,
> and translate them back into high-bit characters in the local-disk name.
I like this solution.
It basically means that a filename in an unknown encoding which is saved
using the WAPI can then be restored using the exactly same unknown encoding.
unknown encoding -> Unicode -> UTF-8 -> Unicode -> unknown encoding
I'm googling a bit to find out how other projects have implemented that.
François
More information about the tahoe-dev
mailing list