[tahoe-dev] [tahoe-lafs] #731: what to do with filenames that are illegal on some systems
tahoe-lafs
trac at allmydata.org
Tue Jun 9 14:19:25 PDT 2009
#731: what to do with filenames that are illegal on some systems
---------------------------+------------------------------------------------
Reporter: zooko | Owner:
Type: defect | Status: new
Priority: major | Milestone: 1.5.0
Component: code-dirnodes | Version: 1.4.1
Keywords: | Launchpad_bug:
---------------------------+------------------------------------------------
If someone copies a file from system A into Tahoe-LAFS and then later
someone tries to copy that file from Tahoe-LAFS into system B, then a
problem could arise if the filename from system A is illegal on system B.
This can happen in a few ways:
1. The filename could be illegal on Windows (http://msdn.microsoft.com
/en-us/library/aa365247.aspx ), and system B could be Windows and system A
non-Windows.
2. The filename could be illegal on Mac
(http://developer.apple.com/technotes/tn/tn1150table.html ).
3. The filename could case-collide with another filename in the same
directory, and system B could be a case-insensitive filesystem. (Note
that Tahoe's current naïve approach will result in a randomly-chosen one
of the files overwriting the other if the target system is Windows or
Macintosh.)
4. If we allowed undecodable bytestring filenames from POSIX system A's,
either by storing bytestring (non-unicode) filenames, or by some escaping
mechanism such as {{{utf8b}}}, then a non-POSIX
system B would not be able to accept that name (or at least we ''should''
not write that name into that system). Likewise some users of POSIX have
a policy that only correctly encoded unicode filenames should be stored in
their filesystem, so for them we should not write that name even though we
can do so by using the POSIX byte-oriented APIs.
Here are someone else's notes about these sorts of issues:
http://www.portfoliofaq.com/pfaq/FAQ00352.htm
See also David A. Wheeler's excellent article arguing that we should start
being pickier about filenames in POSIX systems:
http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html
There are various ways Tahoe can deal with this. It can do something
about it on the Tahoe -> system B leg of the trip, such as by stopping
with an error, offering to rename the offending files, etc.. It could
also do something about it on the system A -> Tahoe leg of the trip.
I think in the short term it might be better if Tahoe rejected non-
portable filenames in the system A -> Tahoe leg of the trip, because we
don't yet know how we want to handle them. By rejecting them, we avoid
the current random-overwrite issue and we don't constrain future versions
of Tahoe-LAFS as much in terms of what sorts of filenames it has to
support. (There ''might'' already be some problematic filenames stored in
Tahoe and we might want to extend Tahoe to deal with these better in the
future, but if Tahoe-v1.5 starts rejecting new ones then the problem will
probably be less widespread and less severe in the future.)
On the other hand, rejecting them would be a UI/API regression, so we
would probably want to add a {{{--force-nonportable-filenames}}} option to
make it behave like Tahoe-v1.4 currently does.
Help!?
--
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/731>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid
More information about the tahoe-dev
mailing list