[tahoe-dev] File naming on POSIX and Windows clients [was: PEP 383 update: ...]
David-Sarah Hopwood
david-sarah at jacaranda.org
Sat May 9 10:33:28 PDT 2009
Stephen J. Turnbull wrote:
> Glenn Linderman writes:
>
> > > While great effort to disambiguate the notation is made, in the end
> > > Tahoe only controls Tahoe filenames ... but there is no problem with
> > > them, since they are well-specified as Unicode.
> >
> > Well, Stephen, you are correct that there is no problem with Tahoe
> > filenames... except that the fact that they are restricted to Unicode,
> > and POSIX filenames are not, _is_ a problem.
>
> Sure, but it's a *solved* problem (surrogate-escape coding systems do
> it simply, a PU character registry does it in a more complicated way).
> Tahoe doesn't seem to like those schemes, too bad for Tahoe -- but
> it's not *our* problem in this thread.
>
> > As presently defined, %% notation has problems, I agree.
Assume that the escape character is changed to '@'.
Then, what problems, precisely?
> > And if other programs get in the act of interpreting the names, and
> > trying to re-encode them, "just like Tahoe would"
>
> You might have a hope if the intent was to emulate Tahoe. But
> those names may get munged by other transports etc. and people will
> undoubtedly be using ad hoc algorithms.
Why would canonically @@-encoded filenames "get munged by other transports",
when they only use characters from the POSIX portable filename character
subset plus '@'?
Filenames that contain invalid surrogates or PU-characters certainly
will get munged by other transports, if they are representable at all.
That was the whole point of using the hex encoding. The other proposals
that have been made are strictly worse in this respect.
> > The [zipfile] idea suffers from the same problem as my earlier
> > suggestion of using a separate directory, rather than a prefix, for
> > encoded names... the files get placed in separate buckets, and
> > globs don't work as uniformly.
>
> It's not clear that users will generally want globs to work on broken
> names.
Why shouldn't they? Broken names are distinguishable by starting with
"@@", but otherwise behave precisely as other names do. I see no
rationale for them to behave differently.
--
David-Sarah Hopwood ⚥
More information about the tahoe-dev
mailing list